DNA data storage is the process through which binary data is encoded and decoded in order to form synthetic DNA strands. The application of DNA as a medium to store data bears huge potential due to its incredible storage capacity. The process of storing molecular data is considered an excellent alternative to store highly dense and durable information which is strongly demanded because of the growing gap between the data generation data storage processes. In fact, DNA is the sheer example of an effective archival data storing process in molecular form. The in vivo storage media based on molecular systems are capable of storing data in the DNA of living organisms lie at the growing intersection of computer systems and biotechnology together with in vitro data storage technology 1.
Historical Review of DNA Data Storage
The idea of using DNA for data storage purposes dates back to the 17th century when the concept of genetic memory was discussed among researchers back then and exactly the time then sequencing and synthesis technologies of DNA were not mature enough. Later on, the concept of DNA data storage was demonstrated experimentally through encoding an image of the ancient Germanic rune for ‘female Earth’. It was in 1999 when the concept was proved to be sensible and pragmatic through storing and hiding classified information in DNA microdots on paper which was not only the first practical storage of data but also the data remain until 2012 as the only demonstration of storing DNA that excluded an in vivo step. Apparently, this attempt to store data in DNA was feasible enough to be considered a strategic decision since the synthetic DNA was cloned into replicated vectors to process sequencing and selecting the desired and correctly synthesized sequences 1.
However, the revolutionary change in DNA data storage technology occurred in the early 2010s when the idea of DNA data storage of as much as hundreds of kilobytes was suggested independently making the progress of reading and writing viable. Since then, the exponential rate of progress and capacity has reached a rate of approximately 3-fold magnitude in the period of 6 years. In most cases, studies employ phosphoramidite-based DNA synthesis method which has evolved over the decades. Moreover, the enzymatic DNA synthesis is it still considered as a competition technique and has already been used for data storage successfully. Another commercially available method uses sequencing synthesis to store data on DNA. In the recent years, several research groups have successfully decoded data using nanopore sequencing based on the Oxford Nanopore Technology MinION platform 1.
Current Trends and Methods in DNA Data Storage
Based on a novel technique of DNA data storage, it is possible to overcome the data redundancy issue that is related to previous implementation attempts. It has been shown that this process could come up as an efficient and cost-effective technique through introducing more letters to the DNA “alphabet”. In this method, strands of DNA with four nucleobases – adenine (A), cytosine (C), guanine (G) and thymine (T) combine to be polynucleotides in which the sequence determines the data storage and the nucleobases are transparent representing the DNA alphabet. The data of as much as 215 petabyte (a petabyte is a million gigabytes) per one gram DNA has been reported recently which equates six times as much as data stored per unit volume of the common storage devices. The data decoding process involves associating every 2 binary data with a different DNA letter (nucleobase) so that a full sequence is made up. Nevertheless, the common DNA synthesis methods suffer from traducing a huge amount of molecules with the same sequence causing the redundancy of a lot of stored data. Interestingly, another study called “composite letters” has been suggested to overcome this problem 1.
A DNA data storage solution has been proposed that is based on bacterial nanonetworks properties which allows digitally encoded DNA to store motility-restricted restricted bacteria capable of composing an archival architecture of clusters to be retrieved or decoded via engineered motile bacteria when operations are needed to be read. In this method, simulations are placed spatially a distinct regions so that the reliability of data decoding from motility restricted storage clusters could be determined. Wet chemistry experiments have been adopted in order to evaluate the possibility of the suggested idea showing how bacteria nanonetworks can desirably retrieve a given code conjugating with motility respected bacteria to finally move towards a target to process the data delivery 2.
Major Steps of DNA Digital Data Storage
The procedure to process data storage on DNA is as follows: To begin with, a computer-based algorithm maps strings of bits into DNA sequences that are synthesized or written by a machine leading to generation of many physical copies of a particular DNA sequence. Next, the solid phase sequences using phosphoramidite-based chemical technique are applied to a low-throughput column or a high-throughput array solid support. After the synthesis is done, the produced DNA material is cloned and stored in vivo in a biological cell. In in some very common techniques, the DNA material is cloned in vitro such as being stored to be protected from the environment or get frozen in a solution. Once the recorded data is needed, it can be retrieved selectively from the DNA pool through a process called Random Access. The random access within DNA pools evaluates and extracts data based on PCR enrichment with primer pairs that are capable of mapping given data packs that are generated during the encoding process.
Overview of DNA Data Processing
In general did they tow storage on DNA encompasses four major steps as right store retrieve and read.
Writing data a DNA begins with a step called encoding in which a computer-based algorithm maps strings of data bits into strands of DNA containing four nucleobases. DNA sequences in this order, occurs by accidents and arbitrary but with a determined length so that bit strings are cut into smaller chunks with the consecutive reassembling them into the digital data. It is essential to include either an index chunk or store overlapping ones in a distinct DNA sequence to enable the reassembly process. It’s been proved that a simple coding scheme based on an index is optimal and for an amount of information, a huge number of distinct DNA sequences is needed to be synthesized.
After the DNA sequences are synthesized, they need to be stored. It is estimated that an individual physically isolated DNA pool can store a series of data of about 1012 bytes. In fact, a directory of such pools is required to get enough capacity to store a large amount of data.
In the retrieving process, after a set of data is requested, the relevance of DNA has to be retrieved physically and sampled. Later, random access strategy is adopted so that reading all of the data in the pool is avoided and in addition to this, it becomes possible to pick up a specific set of data to be read. It should be noted that random access in DNA data storage could be supported based on a selective process like extraction via magnetic bead along with probes that are mapped for PCR application primers.
After a particular sample of DNA is chosen, the next stage involves the sequencing of that in order to generate a series of reads that are related to the molecules to be detected by the sequencer. Afterwards, these reads are decoded in the form of original digital data with a high probability. The prosperity to operate the readout relies on the sequencing coverage and the rate of error recorded throughout the operation.
It has been shown that in vivo DNA storage technique is regarded as the most successful and practical form of data storage considering the stability, cost and scalability. The systems that are based on in vivo DNA data, can be employed as biological recording platforms and are appropriate for collecting new data compared to preserving digital Data on physical devices.