Nucleic acid sequencing is ubiquitous to molecular biology and molecular medicine. Goals for sequencing technologies include expanding throughput, lowering reagent and labor costs and improving accuracy. For a relatively recent review of current sequencing technologies, see, e.g., Chan (2005) “Advances in Sequencing Technology” (Review) Mutation Research 573: 13-40. A commonly stated goal of current sequencing technology development efforts is to bring the cost for sequencing (or at least resequencing) a genome down to about $1,000. If sequencing costs can be brought down to this level, it will be possible to analyze genetic variation in detail for species and individuals, providing a rational basis for personalized medicine, as well as for identifying relatively subtle causal links between genotypes and phenotypes.
Sequencing methods in use include classical polymerase-mediated enzymatic methods such as Sanger dideoxy sequencing (Sanger et al. (1977) “DNA sequencing with Chain terminating inhibitors,” Proc. Natl. Acad. Sci. USA 74:5463-5467), capillary based implementations of Sanger sequencing (Swerdlow et al. (1990) “Capillary Gel Electrophoresis for DNA Sequencing laser-induced florescence detection with the Sheath Flow Cuvette,” J. Chromatogr. 516:61-67; Cohen et al. (1990) “Separation and Analysis of DNA Sequence Reaction Products by Capillary gel Electrophoresis,” J. Chromatogr. 49-60; and Dovichi (1997) “DNA Sequencing by Capillary Electrophoresis” Electrophoresis 18:2393-2399) and automated implementations of Sanger sequencing (Smith et al. (1986) “Fluorescence detection in automated Sequence Analysis Nature 321:674-679; Hood et al. (1987) “Automated DNA Sequencing and Analysis of the Human Genome Genomics 1:201-212; Hunkapiller et al. (1991) “Large Scale and Automated DNA Sequence Determination” Science 254:59-67). Automated systems are in routine use, such as those from Applied Biosystems (Foster City, Calif.). These commercially available systems include, e.g., 1-Capillary Sequencers, 4-Capillary Sequencers, 16-Capillary Sequencers, 48-Capillary Sequencers and 96-Capillary Sequencers. While this technology is robust, highly developed and accurate, throughput and sequencing costs are still not ideal. State of the art Sanger systems, such as the ABI Prism® 3700 series DNA analyzers, permit sequencing of about 900,000 bp/day at most, with costs still running about $0.001 per base (Chan (2005), infra.). This is still far from the goal of sequencing a genome for $1,000. Sequencing reagent costs per reaction in an automated Sanger system are also likely too high to meet the goal of a $1,000 genome.
Current methods that do not use a polymerase for sequencing, at least partly in an effort to address the cost issues of classical Sanger methods, include sequencing by hybridization (Drmanac R et al. (2002) “Sequencing by hybridization (SBH): advantages, achievements, and opportunities,” Adv Biochem Eng Biotechnol. 77:75-101; Church (2006) “Genomes for all” Scientific American. 294(1):52); direct linear analysis (Chan et. al. (2004) “DNA Mapping Using Microfluidic Stretching and Single Molecule Detection of Fluorescent Site-Specific Tags” Genome Research 14: 1137-1146); and nanopore sequencing (Deamer and Branton (2002) “Characterization of Nucleic Acids by Nanopore Analysis,” Acc. Chem. Res. 35:817-825; Meller et al. (2002) “Single Molecule Measurements of DNA Transported through a Nanopore,” Electrophoresis 23:2583-2591). Sequencing by hybridization is primarily useful in interrogating whether specific residues occur in a sequence (rather than completely sequencing a nucleic acid de novo, or even completely resequencing a nucleic acid). Direct linear analysis and nanopore sequencing methods are still largely conceptual.
Accordingly, polymerase-based methods are still the most widely applicable sequencing methods. Sequencing approaches that substantially improve throughput over classical Sanger sequencing methods have been developed, including massively parallel pyrosequencing (Leamon et al. (2003) “A massively parallel PicoTiterPlate based platform for discrete picoliter-scale polymerase chain reactions,” Electrophoresis 24: 682-686), chip-based DNA sequencing by synthesis (DSS) (Seo et al. (2004) “Photocleavable fluorescent nucleotides for DNA on a Chip Constructed by Site-Specific Coupling Chemistry,” Proc. Natl. Acad. Sci. U.S.A. 101:5488-5493); Sequencing using polymerase colonies (Mitra et al. (2003) “Fluorescent in situ Sequencing on Polymerase Colonies,” Anal. Biochem. 320: 55-65); and zero mode waveguides (ZMWs) for real-time single molecule sequencing (Levene et al. (2003) “Zero Mode Waveguides for single Molecule Analysis at High Concentrations,” Science 299:682-686).
Similar to the classical Sanger approaches, these sequencing methods utilize the action of a polymerase to copy a template during sequencing. For example, ZMWs are powerful new sequencing tools that facilitate detection of labeled single nucleotides into single nucleic acids (in real time) as the nucleic acids are copied by a polymerase. Polymerase based “sequencing by incorporation” methods offer advantages inherent in the polymerases being used, such as, e.g., extremely high processivity, extremely low error rates from enzymatic misincorporation and well-characterized reaction enzymology.
One enzymatic sequencing method that is not typically mediated by polymerase activity is “exonuclease sequencing” (reviewed in Chan, 2005, infra, see also Jett et al. (1989) “High speed DNA Sequencing: An approach based on fluorescent detection of single molecules,” J. Biomol. Struct. Dyn. 301-309). In these methods, a processive exonuclease cleaves labeled nucleotides from a DNA molecule, with the labeled nucleotide being detected and analyzed to provide sequence information (Werner et al. (2003) “Progress Towards Single Molecule DNA Sequencing: a one color Demonstration,” J. Biotechnol. 102:1-14). Exonuclease-based methods are theoretically promising, because read lengths for this analysis method are potentially very long, with size separation of the cleavage products not being at issue. “Two base” labeling approaches have been proposed in these methods, to overcome problems with multi-labeled nucleic acids (Jett et al. (1995) METHODS FOR RAPID BASE SEQUENCING IN DNA AND RNA WITH TWO BASE LABELING U.S. Pat. No. 5,405,747). Detection of inherent fluorescence of cleaved nucleotides could, potentially, eliminate the need for nucleotide labeling altogether (Ulmer (1997) METHODS AND COMPOSITIONS FOR DNA SEQUENCING U.S. Pat. No. 5,674,743).
Challenges with previous exonuclease-based sequencing methods include poor exonuclease processivity on the highly labeled nucleic acids used in the protocols and incomplete fluorescent label incorporation (Chan, 2005, infra). The present invention overcomes these and other problems.