There is a continuing demand to store ever more data on or in physical media, with storage devices getting ever smaller as their capacity gets bigger. The amount of data stored is reportedly doubling in size every two years, and according to one study, by 2020 the amount of data we create and copy annually will reach 44 zetabytes, or 44 trillion gigabytes. Moreover, existing data storage media such as hard drives, optical media, and magnetic tapes, are relatively unstable and become corrupted after prolonged storage.
There is an urgent need for alternative approaches to storing large volumes of data for extended periods, e.g. decades or centuries.
Some have proposed using DNA to store data. DNA is extremely stable and could in theory encode vast amounts of data and store the data for very long periods. See, for example, Bancroft, C., et al., Long-Term Storage of Information in DNA, Science (2001) 293: 1763-1765. Additionally, DNA as a storage medium is not susceptible to the security risks of traditional digital storage media. But there has been no practical approach to implementing this idea.
WO 2014/014991, for example, describes a method of storing data on DNA oligonucleotides, wherein information is encoded in binary format, one bit per nucleotide, with a 96 bit (96 nucleotide) data block, a 19 nucleotide address sequence, and flanking sequences for amplification and sequencing. The code is then read by amplifying the sequences using PCR and sequencing using a high speed sequencer like the Illumina HiSeq machine. The data block sequences are then arranged in the correct order using the address tags, the address and flanking sequences are filtered out, and the sequence data is translated into binary code. Such an approach has significant limitations. For example, the 96 bit data block could encode only 12 letters (using the conventional one byte or 8 bits per letter or space). The ratio of useful information stored relative to “housekeeping” information is low—approximately 40% of the sequence information is taken up with the address and the flanking DNA. The specification describes encoding a book using 54,898 oligonucleotides. The ink-jet printed, high-fidelity DNA microchips used to synthesize the oligonucleotides limited the size of the oligos (159-mers described were at the upper limit). Furthermore, reading the oligonucleotides requires amplification and isolation, which introduces additional potential for error. See also, WO 2004/088585A2; WO 03/025123 A2; C. BANCROFT: “Long-Term Storage of Information in DNA”, Science (2001) 293 (5536): 1763c-1765; COX J P L: “Long-term data storage in DNA”, Trends in Biotechnology (2001)19(7): 247-250.
DNA sequencing devices include nanopore-based devices from Oxford Nanopore, Genia and others. In many of those devices, typically a nanopore is used in a fluid-filled cell to read the DNA data by measuring a change in current as the DNA passes through the nanopore, which are typically in the range of nano-amps. Measurements based on changes in capacitance have been proposed but are not commercial; the changes are in the range of pico/fempto/atto-farads. Accordingly, it is very difficult to reliably and repeatably detect such small changes, as they are difficult to distinguish over typical background noise. The difficulties are further enhanced in that DNA can move through a nanopore at the rate of approximately one million bases per second, which is too fast to read accurately using existing means, requiring the use of protein nanopores which slow the passage of DNA through the nanopore, and which are impractical for reading large amounts of data.
Existing nano-pore based DNA data readers do not overcome these problems and thus do not provide highly precise, repeatable, reliable, automated, and robust DNA data reading results. Thus, it would be desirable to have a device that provides high quality, reliable DNA data reading results and also provides a scalable approach to reliably read data stored on multiple DNA molecules simultaneously.
While the potential information density and stability of DNA make it an attractive vehicle for data storage, as has been recognized for over twenty-five years, there is still no practical approach to writing and reading large amounts of data in this form.