An aggressive research effort to sequence the entire human genome is proceeding in the laboratories of genetic researchers throughout the country. The project is called the Human Genome Project (HGP). It is a daunting task given that it involves the complete characterization of the archetypal human genome sequence which comprises 3.times.10.sup.9 DNA nucleotide base pairs. Early estimates for completing the task within fifteen years hinged on the expectation that new technology would be developed in response to the pressing need for faster methods of DNA sequencing.
Current approaches generally incorporate the fundamentals of either the Sanger sequencing method or the Maxam and Gilbert sequencing method, two techniques that were first introduced in the 1970's. [Sanger et al, (1977) "DNA Sequencing with Chain-Terminator Inhibitors," Proc. Natl. Acad. Sci. USA 74:5463-5467); Maxam and Gilbert, (1977) "A new method for sequencing DNA," Proc. Natl. Acad. Sci. USA, 74:560-564]. In the Sanger Method, a short oligonucleotide or primer is annealed to a single-stranded template containing the DNA to be sequenced. The primer provides a 3' hydroxyl group which allows the polymerization of a chain of DNA when a polymerase enzyme and dNTPs are provided. The Sanger method is an enzymatic reaction that utilizes chain-terminating dideoxynucleotides (ddNTPs). ddNTPs are chain-terminating because they lack a 3'-hydroxyl residue which prevents formation of a phosphodiester bond with a succeeding deoxyribonucleotide (dNTP). A small amount of one ddNTP is included with the four conventional dNTPs in a polymerization reaction. Polymerization or DNA synthesis is catalyzed by a DNA polymerase. There is competition between extension of the chain by incorporation of the conventional dNTPs and termination of the chain by incorporation of a ddNTP.
The original version of the Sanger method utilized the E. coli DNA polymerase I ("pol I"), which has a polymerization activity, a 3'-5' exonuclease proofreading activity, and a 5'-3' exonuclease activity. Later, an improvement to the method was made by using Klenow fragment instead of pol I; Klenow lacks the 5'-3' exonuclease activity that is detrimental to the sequencing reaction because it leads to partial degradation of template and product DNA. The Klenow fragment has several limitations when used for enzymatic sequencing. One limitation is the low processivity of the enzyme, which generates a high background of fragments that terminate by the random dissociation of the enzyme from the template rather than by the desired termination due to incorporation of a ddNTP. The low processivity also means that the enzyme cannot be used to sequence nucleotides that appear more than .about.250 nucleotides from the 5' end of the primer. A second limitation is that Klenow cannot efficiently utilize templates which have homopolymer tracts or regions of high secondary structure. The problems caused by secondary structure in the template can be reduced by running the polymerization reaction at 55.degree. C. (R. Gomer and R. Firtel, "Sequencing homopolymer regions." Bethesda Res. Lab. Focus 7:6 1985).
Improvements to the original Sanger method include the use of polymerases other than the Klenow fragment. Reverse transcriptase has been used to sequence templates that have homopolymeric tracts (S. Karanthanasis, "M13 DNA sequencing using reverse transcriptase" Bethesda Res. Lab. Focus 4(3):6 1982; Graham et al, "Direct DNA sequencing using avian myeloblastosis virus and Moleney murine leukemia virus reverse transcriptase" Bethesda Res. Lab. Focus 8(2):4 1986). Reverse transcriptase is somewhat better than the Klenow enzyme at utilizing templates containing homopolymer tracts.
The use of a modified T7 DNA polymerase (Sequenase.TM.) was a significant improvement to the Sanger method. See Sambrook, J. et al. Molecular Cloning, A Laboratory Manual, 2d Ed. Cold Spring Harbor Laboratory Press, New York, 13.7-13.9 and Hunkapiller, M. W. (1991) Curr. Op. Gen. Devl. 1:88-92. T7 DNA polymerase does not have any inherent 5'-3' exonuclease activity and has a reduced selectivity against incorporation of ddNTP. However, the 3'-5' exonuclease activity leads to degradation of some of the oligonucleotide primers. Sequenase.TM. is a chemically-modified T7 DNA polymerase that has reduced 3' to 5' exonuclease activity (Tabor et al. 1987, Proc. Natl. Acad. Sci. USA 84:4767). Sequenase.TM. version 2.0 is a genetically engineered form of the T7 polymerase which completely lacks 3' to 5' exonuclease activity. Sequenase.TM. has a very high processivity and high rate of polymerization. It can efficiently incorporate nucleotide analogs such as dITP and 7-deaza-dGTP which are used to resolve regions of compression in sequencing gels. In regions of DNA containing a high G+C content, Hoogsteen bond formation can occur which leads to compressions in the DNA. These compressions result in aberrant migration patterns of oligonucleotide strands on sequencing gels. Because these base analogs pair weakly with conventional nucleotides, intrastrand secondary structures during electrophoresis are alleviated. In contrast, Klenow does not incorporate these analogs as efficiently.
The use of Taq DNA polymerase and mutants thereof is a more recent addition to the improvements of the Sanger method [see U.S. Pat. No. 5,075,216 to Innis et al. (1993), hereby incorporated by reference]. Taq polymerase is a thermostable enzyme which works efficiently at 70-75.degree. C. The ability to catalyze DNA synthesis at elevated temperature makes Taq polymerase useful for sequencing templates which have extensive secondary structures at 37.degree. C. (the standard temperature used for Klenow and Sequenase.TM. reactions). Taq polymerase, like Sequenase.TM., has a high degree of processivity and like Sequenase 2.0, it lacks 3' to 5' nuclease activity. The thermal stability of Taq and related enzymes (such as Tth and Thermosequenase.TM.) provides an advantage over T7 polymerase (and all mutants thereof) in that these thermally stable enzymes can be used for cycle sequencing which amplifies the DNA during the sequencing reaction, thus allowing sequencing to be performed on smaller amounts of DNA. Optimization of the use of Taq in the standard Sanger Method has focused on modifying Taq to eliminate the intrinsic 5'-3' exonuclease activity and to increase its ability to incorporate ddNTPs to reduce incorrect termination due to secondary structure in the single-stranded template DNA. Tabor and Richardson, EP 0 655 506 B1, hereby incorporated by reference.
Both the Sanger and the Maxim/Gilbert methods produce populations of radiolabelled or fluorescently labeled polynucleotides of differing lengths which are separated according to size by polyacrylamide gel electrophoresis (PAGE). The nucleotide sequence is determined by analyzing the pattern of size-separated radiolabelled polynucleotides in the gel.
The current limitations to conventional applications of the Sanger Method include 1) the limited resolving power of polyacrylamide gel electrophoresis, 2) the formation of intermolecular and intramolecular secondary structure of the denatured template in the reaction mixture, which can cause any of the polymerases to prematurely terminate synthesis at specific sites or misincorporate ddNTPs at inappropriate sites, 3) secondary structure of the DNA on the sequencing gels can give rise to compressions of the electrophoretic ladder at specific locations in the sequence, 4) cleavage of the template, primers and products with the 5'-3' or 3'-5' exonuclease activities in the polymerases, and 5) mispriming of synthesis due to hybridization of the oligonucleotide primers to multiple sites on the denatured template DNA. The formation of intermolecular and intramolecular secondary structure produces artificial terminations that are incorrectly "read" as the wrong base, gives rise to bands across four lanes (BAFLs) that produce ambiguities in base reading, and decrease the intensity and thus signal-to-noise ratio of the bands. Secondary structure of the DNA on the gels can largely be solved by incorporation of dITP or 7-deaza-dGTP into the synthesized DNA; DNA containing such modified NTPs is less likely to form urea-resistant secondary structure during electrophoresis. Cleavage of the template, primers or products leads to reduction in intensity of bands terminating at the correct positions and increase the background. Mispriming gives rise to background in the gel lanes.
The net result is that, although the inherent resolution of polyacrylamide gel electrophoresis alone is as much as 1000 nucleotides, it is common to only be able to correctly read 400-600 nucleotides of a sequence (and sometimes much less) using the conventional Sanger Method, even when using optimized polymerase design and reaction conditions. Some sequences such as repetitive DNA, strings of identical bases (especially guanines, GC-rich sequences and many unique sequences) cannot be sequenced without a high degree of error and uncertainty.
In the absence of any methods to sequence DNA longer than 400-800 bases, investigators must subclone the DNA into small fragments and sequence these small fragments. The procedures for doing this in a logical way are very labor intensive, cannot be automated, and are therefore impractical. The most popular technique for large-scale sequencing, the "shotgun" method, involves cloning and sequencing of hundreds or thousands of overlapping DNA fragments. Many of these methods are automated, but require sequencing 5-10 times as many bases as minimally necessary, leave gaps in the sequence information that must be filled in manually, and have difficulty determining sequences with repetitive DNA.
Thus, the goal of placing rapid sequencing techniques in the hands of many researchers is yet to be achieved. New approaches are needed that eliminate the above-described limitations.