1. Field of the Invention
The present invention provides methods for DNA sequencing utilizing the thermostable DNA polymerase, Taq polymerase, of Thermus aquaticus. DNA sequencing methods are of great practical utility in the fields of molecular biology, genetics, medical diagnostic technology, and forensics. The importance of DNA sequencing is evidenced by the significant commercial activity centered about the production and marketing of reagents and automated instruments for sequencing nucleic acids.
2. Description of Related Disclosures
DNA sequencing by the Sanger dideoxynucleotide method (Sanger et al., 1977, Proc. Natl. Acad. Sci. USA 74:5463-5467) has undergone significant refinement in recent years, including the development of novel vectors (Yanisch-Perron et al., 1985, Gene 33:103-119), base analogs (Mills et al., 1979, Proc. Natl. Acad. Sci. USA 76:2232-2235, and Barr et al., 1986, BioTechniques 4:428-432), enzymes (Tabor et al., 1987, Proc. Natl. Acad. Sci. USA 84:4763-4771), and instruments for partial automation of DNA sequence analysis (Smith et al., 1986, Nature 321:674-679; Prober et al., 1987, Science 238:336-341; and Ansorge et al., 1987, Nuc. Acids Res. 15:4593-4602). The basic dideoxy sequencing procedure involves (i) annealing an oligonucleotide primer to a suitable single or denatured double stranded DNA template; (ii) extending the primer with DNA polymerase in four separate reactions, each containing one .alpha.-labeled dNTP or ddNTP (alternatively, a labeled primer can be used), a mixture of unlabeled dNTPs, and one chain-terminating dideoxynucleoside-5'-triphosphate (ddNTP); (iii) resolving the four sets of reaction products on a highresolution polyacrylamide-urea gel; and (iv) producing an autoradiographic image of the gel that can be examined to infer the DNA sequence. Alternatively, fluorescently labeled primers or nucleotides can be used to identify the reaction products. Known dideoxy sequencing methods utilize a DNA polymerase such as the Klenow fragment of E. coli DNA polymerase I, reverse transcriptase, or a modified T7 DNA polymerase. Protocols for sequencing with these enzymes, however, do not work with Taq polymerase.
Introduction of commercial kits has vastly simplified the art, making DNA sequencing a routine technique for any laboratory. However, there is still a need in the art for sequencing protocols that work well with nucleic acids that contain secondary structure such as palindromic hairpin loops and with G+C-rich DNA, which can form compressions in the DNA through Hoogsteen bond formation. Such DNA typically performs poorly in prior art sequencing protocols and can exhibit aberrant gel migration patterns that also interfere with sequence determination. In addition, there is a need for sequencing methods that can generate DNA sequence information over a long segment of DNA from one sequencing reaction. Currently, different sequencing methods must be used to generate both short and long sequence products. The present invention, as described more fully below, dramatically improves the art of DNA sequencing by, in one aspect, generating both short and long sequencing products in a single sequencing reaction.
The current commercial instruments address the "backend" of the sequencing process: non-isotopic detection and computerized data collection and analysis. Such developments have led many investigators to undertake large-scale sequencing projects, and to consider the sequencing of the entire human genome. The ultimate success of large-scale sequencing projects will depend upon further improvements in the speed and automation of the technology. These include developing alternative methods for handling the "front-end" of the process, i.e., automating the preparation of DNA templates and the performance of the sequencing reactions, and the present method provides a means for fully automating this frontend of the process.
One technique which appears to be ideally suited for automating DNA preparation is the selective amplification of DNA by the polymerase chain reaction (PCR), a method disclosed in U.S. Pat. No. 4,683,202. Methods for performing PCR are disclosed in pending Ser. No. 063,647, filed June 17, 1987, which is a continuation-in-part (CIP) of Ser. No. 899,513, filed Aug. 22, 1986, now abandoned, which is a CIP of Ser. No. 828,144, filed Feb. 7, 1986, which issued as U.S. Pat. No. 4,683,195, and which is a CIP of Ser. No. 791,308, filed Oct. 25, 1985, which issued as U.S. Pat. No. 4,683,202, and which is a CIP of abandoned Ser. No. 716,975, filed Mar. 28, 1985, all of which are incorporated herein by reference. PCR involves repeated cycles of (i) heat denaturation of the DNA, (ii) annealing of two oligonucleotide primers that flank the DNA segment to be amplified, and (iii) extension of the annealed primers with DNA polymerase. With this method, segments of single-copy genomic DNA can be amplified more than 10 million fold with very high specificity and fidelity. The PCR product can then either be subcloned into a vector suitable for sequence analysis or, alternatively, purified PCR products can be sequenced as disclosed by Engelke et al., 1988, Proc. Natl. Acad. Sci. USA 85:544-548; Wong et al., 1987, Nature 330:384-386; and Stoflet et al., 1988, Science 229:491-494.
Saiki et al., 1988, Science 239:487-494, demonstrate that Taq DNA polymerase greatly simplifies the PCR procedure. Because this polymerase has a broad temperature optimum centered around 75.degree. C. and can survive repeated incubations at 95.degree. C., fresh enzyme need not be added after each PCR cycle. Use of Taq DNA polymerase at high annealing and extension temperatures increases the specificity, yield, and length of products that can be amplified, and thus increases the sensitivity of PCR for detecting rare target sequences. Methods for isolating and producing recombinant Taq polymerase are disclosed in pending U.S. patent application Ser. No. 143,441, filed Jan. 12, 1988, which is a CIP of Ser. No. 063,509, filed June 17, 1987, which issued as U.S. Pat. No. 4,889,818, which is a CIP Ser. No. 899,241, now abandoned, filed Aug. 22, 1986, each of which is incorporated herein by reference.
Inverse PCR is a variation of PCR in which the plasmid containing the target template is digested with a restriction endonuclease and recircularized to access flanking sequences for amplification and is fully disclosed in pending Ser. No. 203,000, filed June 6, 1988. PCR has been automated; PCR instruments are disclosed in pending Ser. No. 899,061, filed Aug. 22, 1986, which is a CIP of pending Ser. No. 833,368, filed Feb. 25, 1986, now abandoned. Methods for the structure-independent amplification of DNA by PCR utilizing the structure-destabilizing base analog 7-deazaguanine are disclosed in pending U.S. Ser. No. 248,556, filed Sept. 23, 1988, and are especially useful in the practice of the present method. Methods for generating single-stranded DNA by a process termed asymmetric PCR are disclosed in pending U.S. Ser. No. 248,896, filed Sept. 23, 1988, and are especially useful in conjunction with the present method. The disclosures of these related patents and applications are incorporated herein by reference.
Prior to the present invention however, Taq DNA polymerase had not been used in DNA sequencing methods. Taq DNA polymerase exhibits high processivity, a rapid rate of incorporation, and ability to utilize nucleotide analogs to terminate chain extension and to resolve gel compressions. These properties of Taq DNA polymerase are similar to those of a chemically modified bacteriophage T7 DNA polymerase recently decribed by Tabor et al., 1987, Proc. Natl. Acad. Sci. USA 84:4767-4771. In contrast to T7 DNA polymerase, however, Taq DNA polymerase is a single-chain enzyme which is highly thermostable, as described by Gelfand et al., European Patent Publication 258,017. Because Taq polymerase has no detectible 3'-5'-exonuclease activity, and because the misincorporation rate is high unless certain dNTP and ddNTP concentrations are used, Taq polymerase has not previously been used for sequencing. The present invention provides efficient protocols for DNA sequencing with Taq DNA polymerase, which can also be used for direct sequencing of PCR-amplified DNA.