The determination of nucleic acid sequence information is now an important part of biological and medical research. For example, nucleic acid sequence information is helpful for identifying genes associated with certain diseases and phenotypes, identifying potential drug targets, and understanding the mechanisms of disease development and progress. Sequence information also is an important part of personalized medicine, where it can be used to optimize the diagnosis, treatment, or prevention of disease in a specific subject.
High-throughput, cost-effective nucleic acid sequencing has the potential to usher in a new era of research and personalized medicine. Several commercial sequencing platforms are available, but remain prohibitively expensive for genetic analysis in the mass-market.
Currently, a variety of sequencing technologies utilize a method known alternatively as “sequencing-by-synthesis” (SBS) or “sequencing by incorporation.” This method commonly employs a polymerase to synthesize a DNA strand complementary to a template strand that is to be sequenced. This may involve providing nucleotides or short oligonucleotides, which are modified with identifying tags, so that the base type of the incorporated nucleotide or oligonucleotide is detected as synthesis proceeds. Detection may be in real-time, where the nucleotides are detected as they are incorporated. Unfortunately, real-time procedures can sometimes suffer from inaccurate reads of regions containing highly repetitive sequences and homopolymeric stretches. Detection may also proceed in iterations of stop and proceed steps, wherein controlled reaction conditions and/or reagents reversibly stop and start the reaction at a given time during synthesis.
As many sequencing-by-synthesis technologies are based on fluorescent detection, fluorescent labeling of nucleotides is required. The necessary illumination and optical systems can increase complexity and expense of the system. By way of example, SBS methods often require fluorescently labeled dNTPs for detecting incorporated nucleotides and identifying a template nucleic acid sequence. However, the use of labeled nucleotides has limitations on accuracy, since current SBS reactions using labeled nucleotides become error-prone after a few hundred bases. Even a 1% error rate could compromise the significance of the sequencing results when an entire genome is to be analyzed. Accuracy may be decreased when a failure to detect a single label results in a deletion error or when the detection of a stray molecule results in an insertion error. Fluorophores which are bleached cause false-negatives. In addition, contamination of labeled dNTPs by unlabeled dNTPs (e.g., impurities or hydrolysis products) can also cause false-negatives. Still further, stray signals from labeled dNTPs non-specifically bound to a structured surface contribute to insertion errors or high signal to noise ratios. The use of modified nucleotides significantly slows enzyme kinetics, thereby making the sequencing reaction very slow. Another challenge with labeled nucleotides in SBS procedures is that the label needs to be removed or deactivated after it is incorporated and detected, so that the next addition can be observed without background signal. Thus, to obtain long read-lengths, each addition must be followed by virtually 100% chemical, enzymatic or photolytic steps to unblock the substrate or remove the dye for the next addition.
Disclosed below is a technical approach that overcomes many of the problems typically associated with prior sequencing technologies.