A variety of DNA sequencing methodologies have been developed and commercialized over the past two decades (see, for example, E. Mardis (2008), “Next-Generation DNA Sequencing Methods”, Annu. Rev. Genomics Hum. Genet. 9:387-402; and J. Heather and B. Chain, (2016), “The Sequence of Sequencers: The History of Sequencing DNA”, Genomics 107:1-8 for recent reviews). Many “second generation” and “third generation” sequencing technologies utilize a massively parallel, cyclic array approach to sequencing-by-synthesis (SBS), in which accurate decoding of a single-stranded template oligonucleotide sequence tethered to a solid support relies on successfully classifying signals that arise from the stepwise addition of A, G, C, and T nucleotides by a polymerase to a complementary oligonucleotide strand. These methods typically require the oligonucleotide template to be modified with a known adapter sequence of fixed length, affixed to a solid support in a random or patterned array by hybridization to surface-tethered probes of known sequence that is complementary to that of the adapter sequence, and then probed using, for example, a single molecule (non-amplified), synchronous sequencing-by-synthesis (smSBS) approach (e.g., the Helicos technology), or a single molecule, asynchronous sequencing-by-synthesis (smASBS) approach (e.g., the Pacific Biosciences technology). In the smSBS approach, terminator nucleotides encoded with fluorescent tags are used, such that a replication enzyme can only incorporate a single base per cycle. The Helicos technology, for example, used a single fluorescent tag and a sequential introduction of A, G, C, T was performed—once base per cycle. During each cycle, an imaging step was performed to classify the correct ‘base” for each single molecule template on an array. Following the imaging steps, the reversibly-linked tags are removed, such that the replicating enzyme (polymerase) can incorporate the next templating base. These cycles are repeated many times to eventually decode the template oligonucleotide strands on the random array and determine their respective sequences.
While successful, the cyclic array approach has generally suffered from two fundamental inadequacies: (i) the cycle times for addition of each successive nucleotide to the complementary strand are long, and (ii) the signals arising from the stepwise addition of single nucleotides are weak (typically detected through the use of fluorescent labels and fluorescence imaging techniques) and exhibit low contrast-to-noise ratios (CNRs) as will be discussed in more detail below, and therefore require long imaging times using costly instrumentation comprising high precision optics to achieve accurate base-calling.
Attempts to address the cycle time issue for cyclic array sequencing approaches have been made, for example, through the advent of single molecule asynchronous sequencing-by-synthesis (smASBS) approaches, e.g., the Pacific Biosciences technology in which four spectrally-distinct fluorescent tags are linked to the respective A, G, C, and T nucleotides, the addition of which can then be classified in “real-time”. In this approach, all four labeled nucleotides are introduced simultaneously and images are acquired during the entire strand replication process. Each position in the sequence is classified as ‘A’, ‘G’, ‘C’, and ‘T’ based on the spectrum of the detected light. Here, the cycle times can theoretically be as fast as the polymerase-catalyzed replication rate, but the trade-off is decreased CNR, thereby introducing classification errors that ultimately lead to diminished accuracy, and putting greater reliance on high precision optics and costly instrumentation.
Attempts to address the signal limitations in some cyclic array sequencing approaches (i.e., non-single molecule approaches) have been made by incorporating an amplification step in the process. Solid-phase amplification of template DNA molecules tethered to a solid support in a random or patterned array increases the number of copies of the target to be sequenced, such that the signal arising from a “colony” of replicate template molecules upon step-wise addition of detectable bases to their respective complementary strands can be classified as ‘A’, ‘G’, ‘C’, or ‘T’. The probability of successful classification (and thus the accuracy of base-calling) is dependent on the respective CNR during each detection event, which is often limiting.
Thus, there is a need for improved solid supports and solid phase amplification methods for nucleic acid sequencing that will increase the magnitude of base addition signals, decrease non-specific background signals, and thus improve CNR, thereby improving the accuracy of base-calling, potentially shortening cycle times, and reducing the dependence of the sequencing process on high precision optics and costly instrumentation.