Much of the data being produced by computing devices is stored on conventional data storage systems that include various kinds of magnetic storage media, optical storage media, and/or solid state storage media. The capacity of conventional data storage systems is not keeping pace with the rates of data being produced by computing devices. Polynucleotides, such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), can be used to store very large amounts of data on a scale that exceeds the capacity of conventional storage systems. An arrangement of nucleotides included in a polynucleotide (e.g., CTGAAGT . . . ) can correspond to an arrangement of bits that encodes digital data (e.g., 11010001 . . . ). The digital data can include audio data, video data, image data, text data, software, combinations thereof, and the like.
The retrieval of digital data stored by polynucleotide sequences can be achieved using processes that amplify polynucleotides that encode the digital data that is being requested. For example, polymerase chain reaction (PCR) can be used to amplify polynucleotides that encode the digital data being requested. Amplification of polynucleotides can produce an amplification product that includes an amount of the target polynucleotides being amplified that is several orders of magnitude greater than the original quantity of the target polynucleotides.
The amplification of polynucleotides that encode digital data may be performed selectively such that the polynucleotides encoding the desired digital data are amplified much more than other polynucleotides. To illustrate, polynucleotides of two different data files can be stored in a container of a polynucleotide data storage system and one of the data files can be the subject of a request for digital data. After selective amplification, the number of polynucleotides associated with the requested data file will be orders of magnitude greater than the number of polynucleotides of the other data file. A sample of the amplification product can be sequenced by a sequencing machine and the sequencing data that includes reads from the sequencing machine can be analyzed to reproduce the original bits of the requested digital data. Although the polynucleotides associated with the data file that was not requested are still present, the probability of sequencing these polynucleotides is very small because there are so many more copies of the polynucleotides from the requested data file. Thus, the polynucleotide sequences included in the sequencing data that correspond to the requested digital data can be identified because they are found in greater quantities than the polynucleotide sequences that are not associated with the digital data request.
In some cases, the amplification processes can take place in an uneven manner with some polynucleotides being amplified at a faster rate than others. For example, digital data of some data files can be encoded by a greater quantity of polynucleotides than digital data of other data files due to the number of bits included in the data files. When polynucleotides of different data files are amplified together, the data files encoded by the greater numbers of polynucleotides may be amplified at a faster rate than the polynucleotides of the other data files. Although both data files are targets of a request, the difference in amplification rates may result in the sequencing machine failing to sequence both sets of polynucleotides equally. Therefore, differences in the amplification rates of the polynucleotides that encode digital data of the data files of varying sizes can lead to inefficiencies in the random access and sequencing processes.