The goal to elucidate the entire human genome has created an interest in technologies for rapid DNA sequencing, both for small and large scale applications. Important parameters are sequencing speed, length of sequence that can be read during a single sequencing run, and amount of nucleic acid template required. These research challenges suggest aiming to sequence the genetic information of single cells without prior amplification, and without the prior need to clone the genetic material into sequencing vectors. Large scale genome projects are currently too expensive to realistically be carried out for a large number of organisms or patients. Furthermore, as knowledge of the genetic basis for human diseases increases, there will be an ever-increasing need for accurate, high-throughput DNA sequencing that is affordable for clinical applications. Practical methods for determining the base pair sequences of single molecules of nucleic acids, preferably with high speed and long read lengths, would provide the necessary measurement capability.
Two traditional techniques for sequencing DNA are the dideoxy termination method of Sanger (Sanger et al., Proc. Natl. Acad. Sci. U.S.A. 74: 563–5467 (1977)) and the Maxam-Gilbert chemical degradation method (Maxam and Gilbert, Proc. Natl. Acad. Sci. U.S.A. 74: 560–564 (1977)). Both methods deliver four samples with each sample containing a family of DNA strands in which all strands terminate in the same nucleotide. Ultrathin slab gel electrophoresis, or more recently capillary array electrophoresis is used to resolve the different length strands and to determine the nucleotide sequence, either by differentially tagging the strands of each sample before electrophoresis to indicate the terminal nucleotide, or by running the samples in different lanes of the gel or in different capillaries. Both the Sanger and the Maxam-Gilbert methods are labor- and time-intensive, and require extensive pretreatment of the DNA source. Attempts have been made to use mass spectroscopy to replace the time-intensive electrophoresis step. For review of existing sequencing technologies, see Cheng “High-Speed DNA-Sequence Analysis,” Prog. Biochem. Biophys. 22: 223–227 (1995).
Related methods using dyes or fluorescent labels associated with the terminal nucleotide have been developed, where sequence determination is also made by gel electrophoresis and automated fluorescent detectors. For example, the Sanger-extension method has recently been modified for use in an automated micro-sequencing system which requires only sub-microliter volumes of reagents and dye-labelled dideoxyribonoucleotide triphosphates. In U.S. Pat. No. 5,846,727 to Soper et al., fluorescence detection is performed on-chip with one single-mode optical fiber carrying the excitation light to the capillary channel, and a second single-mode optical fiber collecting the fluorescent photons. Sequence reads are estimated in the range of 400–500 bases which is not a significant improvement over the amount of sequence information obtained with traditional Sanger or Maxam-Gilbert methods. Furthermore, the Soper method requires PCR amplification of template DNA, and purification and gel electrophoresis of the oligonucleotide sequencing ‘ladders,’ prior to initiation of the separation reaction. These systems all require significant quantities of target DNA. Even the method described in U.S. Pat. No. 5,302,509 to Cheeseman, which does not use gel electrophoresis for sequence determination, requires at least a million DNA molecules.
In a recent improvement of a sequencing-by-synthesis methodology originally devised ten years ago, DNA sequences are being deduced by measuring pyrophosphate release upon testing DNA/polymerase complexes with each deoxyribonucleotide triphosphate (dNTP) separately and sequentially. See Ronaghi et al., “A Sequencing Method Based on Real-Time Pyrophosphate,” Science 281: 363–365 (1998) and Hyman, “A New Method of Sequencing DNA,” Anal. Biochem. 174: 423–436 (1988). While using native nucleotides, the method requires synchronization of polymerases on the DNA strands which greatly restricts sequence read lengths. Only about 40 nucleotide reads were achieved, and it is not expected that the detection method can approach single molecule sensitivity due to limited quantum efficiency of light production by luciferase in the procedure presented by Ronaghi et al., “A Sequencing Method Based on Real-Time Pyrophosphate,” Science 281: 363–365 (1998). Furthermore, the overall sequencing speed is limited by the necessary washing steps, subsequent chemical steps in order to identify pyrophosphate presence, and by the inherent time required to test each base pair to be sequenced with all the four bases sequentially. Also, difficulties in accurately determining homonucleotide stretches in the sequences were recognized.
Previous attempts for single molecule sequencing (generally unsuccessful but seminal) have utilized exonucleases to sequentially release individual fluorescently labelled bases as a second step after DNA polymerase has formed a complete complementary strand. See Goodwin et al., “Application of Single Molecule Detection to DNA Sequencing,” Nucleos. Nucleot. 16: 543–550 (1997). It consists of synthesizing a DNA strand labelled with four different fluorescent dNTP analogs, subsequent degradation of the labelled strand by the action of an exonuclease, and detection of the individual released bases in a hydrodynamic flow detector. However, both polymerase and exonuclease have to show activity on a highly modified DNA strand, and the generation of a DNA strand substituted with four different fluorescent dNTP analogs has not yet been achieved. See Dapprich et al., “DNA Attachment to Optically Trapped Beads in Microstructures Monitored by Bead Displacement,” Bioimaging 6: 25–32 (1998). Furthermore, little precise information is known about the relation between the degree of labeling of DNA and inhibition of exonuclease activity. See Dorre et al., “Techniques for Single Molecule Sequencing,” Bioimaging 5: 139–152 (1997).
In a second approach utilizing exonucleases, native DNA is digested while it is being pulled through a thin liquid film in order to spatially separate the cleaved nucleotides. See Dapprich et al., “DNA Attachment to Optically Trapped Beads in Microstructures Monitored by Bead Displacement,” Bioimaging 6: 25–32 (1998). They then diffuse a short distance before becoming immobilized on a surface for detection. However, most exonucleases exhibit sequence- and structure-dependent cleavage rates, resulting in difficulties in data analysis and matching sets from partial sequences. In addition, ways to identify the bases on the detection surface still have to be developed or improved.
Regardless of the detection system, methods which utilize exonucleases have not been developed into methods that meet today's demand for rapid, high-throughput sequencing. In addition, most exonucleases have relatively slow turnover rates, and the proposed methods require extensive pretreatment, labeling and subsequent immobilization of the template DNA on the bead in the flowing stream of fluid, all of which make a realization into a simple high-throughput system more complicated.
Other, more direct approaches to DNA sequencing have been attempted, such as determining the spatial sequence of fixed and stretched DNA molecules by scanned atomic probe microscopy. Problems encountered with using these methods consist in the narrow spacing of the bases in the DNA molecule (only 0.34 nm) and their small physicochemical differences to be recognized by these methods. See Hansma et al., “Reproducible Imaging and Dissection of Plasmid DNA Under Liquid with the Atomic Force Microscope,” Science 256: 1180–1184 (1992).
In a recent approach for microsequencing using polymerase, but not exonuclease, a set of identical single stranded DNA (ssDNA) molecules are linked to a substrate and the sequence is determined by repeating a series of reactions using fluorescently labelled dNTPs. U.S. Pat. No. 5,302,509 to Cheeseman. However, this method requires that each base is added with a fluorescent label and 3′-dNTP blocking groups. After the base is added and detected, the fluorescent label and the blocking group are removed, and, then, the next base is added to the polymer.
Thus, the current sequencing methods either require both polymerase and exonuclease activity to deduce the sequence or rely on polymerase alone with additional steps of adding and removing 3′-blocked dNTPs. The human genome project has intensified the demand for rapid, small- and large-scale DNA sequencing that will allow high throughput with minimal starting material. There also remains a need to provide a method for sequencing nucleic acid molecules that requires only polymerase activity, without the use of blocking substituents, resulting in greater simplicity, easier miniaturizability, and compatibility to parallel processing of a single-step technique.
The present invention is directed to meeting the needs and overcoming deficiencies in the art.