The goal to elucidate the entire human genome has created an interest in technologies for rapid DNA sequencing, both for small and large scale applications. Important parameters are sequencing speed, length of sequence that can be read during a single sequencing run and the amount of nucleic acid template required. These research challenges suggest aiming to sequence the genetic information of single cells without prior amplification, and without the prior need to clone the genetic material into sequencing vectors. Large scale genome projects are currently too expensive to realistically be carried out for a large number of organisms or patients. Furthermore, as knowledge of the genetic basis for human diseases increases, there will be an increasing need for accurate, high-throughput DNA sequencing that is affordable for clinical applications.
Two traditional techniques for sequencing DNA are the dideoxy termination method of Sanger (Sanger et al., Proc. Natl. Acad. Sci. U.S.A. 74: 563-5467 (1977)) and the Maxam-Gilbert chemical degradation method (Maxam and Gilbert, Proc. Natl. Acad. Sci. U.S.A. 74: 560-564 (1977)). Both methods deliver four samples with each sample containing a family of DNA strands in which all strands terminate at the same type of nucleotide. Ultrathin slab gel electrophoresis, or more recently capillary array electrophoresis, is used to resolve the different length strands and to determine the nucleotide sequence, either by differentially tagging the strands of each sample before electrophoresis to indicate the terminal nucleotide, or by running the samples in different lanes of the gel or in different capillaries. Both the Sanger and the Maxam-Gilbert methods are labor- and time-intensive, and require extensive pretreatment of the DNA source. Attempts have been made to use mass spectroscopy to replace the time-intensive electrophoresis step. For a review of existing sequencing technologies, see Cheng, “High-Speed DNA-Sequence Analysis,” Prog. Biochem. Biophys. 22: 223-227 (1995).
Related methods using dyes or fluorescent labels associated with the terminal nucleotide have been developed, where sequence determination is also made by gel electrophoresis and automated fluorescent detectors. For example, the Sanger-extension method has recently been modified for use in an automated microsequencing system which requires only sub-microliter volumes of reagents and dye-labeled dideoxyribonoucleotide triphosphates. In U.S. Pat. No. 5,846,727 to Soper et al. (“Soper”), fluorescence detection is performed on-chip with one single-mode optical fiber carrying the excitation light to the capillary channel, and a second single-mode optical fiber collecting the fluorescent photons. Sequence reads are estimated in the range of 400-500 bases which is not a significant improvement over the amount of sequence information obtained with traditional Sanger or Maxam-Gilbert methods. Furthermore, the Soper method requires PCR amplification of template DNA, and purification and gel electrophoresis of the oligonucleotide sequencing ‘ladders,’ prior to initiation of the separation reaction. These systems all require significant quantities of target DNA. Other conventional methods also suffer from the same drawback. See U.S. Pat. No. 5,302,509 to Cheeseman.
In a recent improvement of a sequencing-by-synthesis methodology originally devised ten years ago, DNA sequences are being deduced by measuring pyrophosphate release upon testing DNA/polymerase complexes with each deoxyribonucleotide triphosphate (dNTP) separately and sequentially. See Ronaghi et al. (“Ronaghi”), “A Sequencing Method Based on Real-Time Pyrophosphate,” Science 281: 363-365 (1998); and Hyman, “A New Method of Sequencing DNA,” Anal. Biochem. 174: 423-436 (1988). While using native nucleotides, the method requires synchronization of polymerases on the DNA strands which greatly restricts sequence read lengths. Only about 40 nucleotide reads were achieved, and it is not expected that the detection method can approach single molecule sensitivity due to limited quantum efficiency of light production by luciferase in the procedure presented by Ronaghi. Further, overall sequencing speed is limited by washing steps, subsequent chemical steps in order to identify pyrophosphate presence, and the time required to test each base pair to be sequenced with all of the four bases sequentially. Additionally, difficulties in accurately determining homonucleotide stretches in the sequences have been recognized.
Previous, generally unsuccessful (albeit seminal) attempts at single molecule sequencing have utilized exonucleases to sequentially release individual fluorescently-labeled bases as a second step after DNA polymerase has formed a complete complementary strand. See Goodwin et al., “Application of Single Molecule Detection to DNA Sequencing,” Nucleos. Nucleot. 16: 543-550 (1997). It consists of synthesizing a DNA strand labeled with four different fluorescent dNTP analogs, subsequent degradation of the labeled strand by the action of an exonuclease, and detection of the individual released bases in a hydrodynamic flow detector. However, both polymerase and exonuclease have to show activity on a highly modified DNA strand, and the generation of a DNA strand substituted with four different fluorescent dNTP analogs has not yet been achieved. See Dapprich et al., “DNA Attachment to Optically Trapped Beads in Microstructures Monitored by Bead Displacement,” Bioimaging 6: 25-32 (1998). Furthermore, little information is known about the relationship between the degree of labeling of DNA and inhibition of exonuclease activity. See Dorre et al., “Techniques for Single Molecule Sequencing,” Bioimaging 5: 139-152 (1997).
In a second approach utilizing exonucleases, native DNA is digested while it is being pulled through a thin liquid film in order to spatially separate cleaved nucleotides. See Dapprich et al., “DNA Attachment to Optically Trapped Beads in Microstructures Monitored by Bead Displacement,” Bioimaging 6: 25-32 (1998). They then diffuse a short distance before becoming immobilized on a surface for detection. However, most exonucleases exhibit sequence and structure-dependent cleavage rates, resulting in difficulties in data analysis and matching sets from partial sequences.
Regardless of the detection system, methods which utilize exonucleases have not been developed into methods that meet today's demand for rapid, high-throughput sequencing. In addition, most exonucleases have relatively slow turnover rates, and the proposed methods require extensive pretreatment, labeling and subsequent immobilization of the template DNA on the bead in the flowing stream of fluid, all of which make a realization into a simple high-throughput system more complicated.
Other, more direct approaches to DNA sequencing have been attempted, such as determining the spatial sequence of fixed and stretched DNA molecules by scanned atomic probe microscopy. Problems encountered with using these methods include the narrow spacing of the bases in the DNA molecule (only about 0.34 nm) and the small physicochemical differences to be recognized by these methods. See Hansma et al., “Reproducible Imaging and Dissection of Plasmid DNA Under Liquid with the Atomic Force Microscope,” Science 256: 1180-1184 (1992).
In a recent approach for microsequencing using polymerase, but not exonuclease, a set of identical single stranded DNA (ssDNA) molecules were linked to a substrate and the sequence was determined by repeating a series of reactions using fluorescently labelled dNTPs. See U.S. Pat. No. 5,302,509 to Cheeseman. However, this method requires that each base be added with a fluorescent label and 3′-dNTP blocking groups. After the base is added and detected, the fluorescent label and the blocking group are removed and the next base is added to the polymer.
Optical methods and devices for sequencing biological polymers have several limitations. One limitation is that the lifetime of a polymerization enzyme has an inverse relationship with respect to the time the polymerization enzyme is illuminated. That is, once a polymerization enzyme is illuminated to begin sequencing a biological polymer, the polymerization enzyme loses its activity (and functionality) after a certain time period, which is typically less than the time period required to sequence the entire biological polymer with a single polymerization enzyme. One solution is to divide a biological polymer into smaller subunits, with the number of subunits selected such that each subunit can be sequenced before the polymerization enzyme associated with that subunit loses its functionality. However, this process is time consuming and costly, and prone to error. Another limitation of the conventional sequencing methods is that they are typically slow. Obtaining accurate sequencing information can take long periods of time, up to several days.
The human genome project has intensified the need for rapid, small and large-scale DNA sequencing methods that will allow high throughput with minimal starting material. Accordingly, there is a need for sequencing methods with improved throughput and accuracy.