DNA sequencing methods are important and powerful tools in the molecular biologist's repository of techniques for assessing and understanding gene expression and regulation. Methods for sequencing DNA molecules include chemical degradation sequencing (Maxam et al. (1977) Proc. Natl. Acad. Sci. USA 74: 560, see, also, Ambrose et al. (1987) Methods Enz. 152: 522) and chain termination sequencing (Sanger et al. (1977) Proc. Natl. Acad. Sci. USA 74: 5463). The Maxam-Gilbert sequencing method is a degradative method that relies on specific cleavage of DNA fragments. A fragment of DNA, which is radiolabeled at one end, is partially cleaved in five separate chemical reactions. Each reaction is specific for a type of base or a base so that five populations of labeled fragments of differing lengths are generated. The populations of fragments are resolved by polyacrylamide gel electrophoresis (PAGE).
The chain termination or Sanger method, which is presently the preferred method for sequencing DNA, relies on a DNA synthesis. Single-stranded DNA is used as a template and labeled primers are used to initiate formation of complementary strands. The synthesis reaction is run in the presence of the four deoxynucleotides (dNTPs), dATP, dTTP, dCTP and dGTP and one of the four dideoxynucleotides (ddNTPs), ddATP, ddTTP, ddCTP, and ddGTP. The complementary strands are prematurely terminated along the chain by addition of dideoxynucleotides to the growing chains. By starting with four reaction mixtures, which each contain one of the four ddNTPs, sets of strands terminating in A, C, G and T are generated. Each reaction mixture is electrophoresed in one lane of a polyacrylamide gel that resolves fragments that differ by a single base in length.
These methods are primarily designed for sequencing single-stranded DNA, which is produced by denaturing the DNA and separating the single strands, or by other methods, such as cloning into single-stranded phage vectors. These sequencing methods rely on reactions that produce an array of fragments that differ in length by a single base and terminate in an identifiable base. The fragments are resolved by size using PAGE and are detected using a label, such as a radioisotope. Because the resolution of bands on an electrophoretic gel decreases exponentially as the length of the DNA fragments increase, these methods only permit DNA fragments of up to about 300 to 400 nucleotides to be sequenced.
Before sequencing, sufficient quantities of the single strands can be generated by cloning and replicating the target DNA in a bacterial host using a suitable vector, such as the single-stranded filamentous phage vectors, M13 (see, e.g., Messing (1983) Methods Enzymol. 101: 20), and other phage vectors (see, e.g., Barnes et al. (1983) Nucl. Acids Res. 11: 349-368; Dente et al. (1983) Nucl. Acids Res. 13: 1645-1655; Laughton et al. (184) Nature 310: 25-31). Sequencing can then be effected using a so-called universal primer that is complementary to the vector near the site at which the target DNA is inserted.
The target DNA can also be amplified using the polymerase chain reaction (PCR) (see, e.g., Mullis et al. (1986) Cold Spring Harbor Symp. Quant. Biol. 51: 263; U.S. Pat. No. 4,683,202 to Mullis et al.), which results in an amplified concentration of the duplex target DNA. A number of methods have been used to generate single-stranded templates directly from PCR for subsequent sequencing. For example, radio-labeled primer that is specific for only one strand may be used. Alternatively, PCR may be run under conditions such that one primer is at limited concentration. Once the primer that is at the limited concentration is exhausted, the second strand is amplified at a linear rate through succeeding cycles (Gyllenstein et al. (1988) Proc. Natl. Acad. Sci. 85: 7652; Mihilovic et al. (1989) BioTechniques 7: 14).
There are numerous sequencing strategies in use. The selected strategy depends upon the purpose for which the DNA is sequenced and the amount of information available about the DNA prior to sequencing. For example, if the target DNA is sequenced in order to confirm that a particular mutation has been introduced into the DNA, it may only be necessary to sequence a small region of DNA. If the DNA fragment is an unknown gene or portion of a gene for which a sequence must be accurately determined, then it may be necessary to sequence the entire fragment. Because the sizes of fragments that can be sequenced are limited to about 400 bases, DNA fragments longer than this size must be cleaved. Cleavage may be random, by subcloning segments of the target DNA. The subcloned fragments, which include overlapping fragments, are then sequenced, and ordered using a computer program see, e.g., Staden (1986) Nucl. Acids Res. 14: 217). Alternatively, the DNA may be systematically subcloned by generating and sequencing overlapping or nested mutants or by other ordered approaches.
The Sanger chain termination method and other sequencing methods rely on the use of single-stranded template by cloning the target DNA into single-stranded phage vectors. The use of plasmids as vectors for the target DNA, however, is preferred over the use of phage DNA for reasons, which include the variety of available plasmids, the ease with which plasmids are manipulated, and the greater stability of inserted DNA in plasmids compared to phage vectors. Consequently, methods for sequencing in which the target DNA is cloned into plasmid DNA, rather than into a single-stranded phage DNA (see, e.g., Wallance et al. (1981) Gene 16: 21-26; Guo et al. (1982) Nucl. Acids Res. 10: 2065-2084; Vieira et al. (1982) Gene 19: 259-268) have been developed.
These methods are designed to only sequence one strand of the target DNA at a time (see, e.g., Chen et al. (1985) DNA 4: 165-170; Hattori et al. (1986) Anal. Biochem. 152: 232-238; Mierendorf et al. (1987) Methods Enzymol. 152: 556; Mehra et al. (1986) Proc. Natl. Acad. Sci, U.S.A. 83: 7013-7017). Use of double-stranded DNA, however, avoids the subcloning or isolation of single-stranded DNA fragments, which are used for the dideoxy chain terminator sequencing reactions. The use of double-stranded DNA, however, had been limited because of the poor template quality of denatured duplex DNA. As a result, these methods had not provided as accurate sequence data as provided by methods in which the DNA is cloned into a single-stranded vector.
Recently, the problems associated with template quality have been solved by the development of methods that use plasmids that include sites, adjacent to both complementary strands of the inserted DNA, to which strand-specific primers may be efficiently hybridized. By virtue of these methods each of the single strands of double-stranded DNA can be sequenced directly from plasmid DNA without prior subcloning into phage vectors (see, e.g., Chen et al. ((1985) DNA 4: 165-170 and Chi et al. (1988) Nucl. Acids Res. 16: 10382). A strand specific synthetic primer is annealed to covalently closed circular DNA, which has been denatured by heat or alkali, before proceeding with dideoxy sequencing reactions. Alternatively, the primer can be annealed to open circle double-stranded plasmid DNA, which has been denatured by alkali, as a template (see, Hattori et al. ((1986) Anal. Biochem. 152: 232-238). The double-stranded DNA is denatured with alkali or heat prior to sequencing using the Sanger method, which is performed at 37.degree. C. or higher. The use of different "forward" and "reverse" primers, which are each complementary to the lac Z sequences adjacent to the EcoRI site in .lambda.gt11, for separately sequencing each strand of DNA that has been cloned into .lambda.gt11 has also been described (see, Mehra et al. (1986) Proc. Natl. Acad. Sci, USA 83: 7013-7017).
Plasmids with oppositely oriented promoter regions, which are used in methods which involve transcription, are also used as vehicles for target DNA which is to be sequenced. Each promoter region serves as a distinct specific priming site for sequencing the inserted DNA. Such plasmids are commercially available. For example, the twin promoter plasmid pGEM.TM. contains the bacteriophage SP6 and T7 RNA polymerase promoters in opposite orientations (Mierendorf et al. (1987) Meth. Enzymol. 152: 556-562).
Although methods for sequencing of single-stranded are superior to those for sequencing double-stranded DNA, unless both strands are sequenced, such methods do not allow for detection of errors generated in the sequencing process nor correction of those errors, since only the information from a single strand is available. Sequencing errors arise from a number of sources, including the quality of the template DNA and the type of DNA polymerase used. For example, premature dissociation of the DNA polymerase from the template before the terminating ddNTP is introduced into the replicating fragment is a common cause of error in the Sanger method. Each DNA polymerase has a characteristic tendency to dissociate, which is measured by the processivity, the average number of nucleotides synthesized before the enzyme dissociates from the template. DNA polymerase I of E. coli has an average processivity of 10-50, SEQUENASE.TM. or SEQUENASE.TM. 2.0 has a processivity of approximately 2000 and 3000 respectively, and Taq DNA polymerase has a processivity of greater than 7600 nucleotides. (Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Cold Spring Harbor, N.Y., at p. 13.8). Thus E. coli DNA polymerase tends to generate a higher background of fragments because the enzyme often dissociates from the template before the terminating ddNTP is incorporated. Using a polymerase that terminates prematurely or using a damaged template, which produces a high "background" of inaccurate oligonucleotides, results in erroneous sequencing data. Additional sources of error include sequence anomalies, such as regions of dyad symmetry, which produce overlapping bands on a gel, and the formation of secondary structures between oligonucleotides or within an oligonucleotide, which causes the oligonucleotides to migrate improperly, which in turn produces compressed bands on a separating gel.
Because double-stranded DNA is composed of complementary strands, the accuracy with which a sequence is determined can be improved by sequencing both strands of DNA. It has been estimated (Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed., Cold Spring Harbor Laboratory Cold Spring Harbor, N.Y.) that an accuracy of 0.1% can be achieved by sequencing a strand and its complement and resolving all discrepancies between the strands. Some methods for simultaneously sequencing both DNA strands have been developed. Thus far, these methods are less convenient than those for sequencing single-strands.
For example, one method for sequencing double-stranded DNA (Guo et al. (1982) Nucl. Acids Res. 10: 2065-2084) includes the steps of cleaving a plasmid that contains the inserted DNA with a restriction enzyme, which cuts at only one site to produce a linear molecule having either recessed 3' ends or blunt ends, followed by controlled digestion with exonuclease III to produce shortened 3' ends and long 5' ended single strands. In one variation of the method, samples of the digested molecules are removed at regular time intervals as exonuclease reaction proceeds, and are pooled. In a second variation, the reaction is stopped after a predetermined time. Labeled nucleotides are then incorporated into the 3' termini of the exonuclease III-digested DNA to produce molecules that serve as template-primer systems in which the shortened strands with the 3' single-stranded ends serve as primers. In the first variation of the method, each of the four [.alpha.-.sup.32 P] dNTP are added to one of four reactions. In the second variation, the four ddNTPs and the four dNTPs are used in each of four reactions with one [.alpha.-.sup.32 P] dNTP added to each reaction. In both variations of the method, the linearized plasmid is labelled at both ends. Upon digestion with a second restriction enzyme that cuts the fragments asymmetrically, two families of labeled fragments are produced. When these are run on a separating gel, the sequences of the strands of DNA from each of the 3' ends up to the points at which exonuclease III cleavage ended can be read. This method, however, requires a plasmid that has appropriate restriction sites so that asymmetric cleavage can be effected and also relies on the uniformity of the rate of digestion with exonuclease III. Other drawbacks include difficulties that arise from sequencing DNA that contains runs of identical bases, particularly using the first variation, and the appearance of extraneous bands, when using the second variation. In addition, although this method sequences both strands of double-stranded DNA, it does not readily provide overlapping sequence data. Rather, part of the target DNA sequence is obtained from one end of the fragment and another part of the sequence is obtained by sequencing the other end of the fragment.
Another method (Kambara et al. (1991) Biotechnology 9: 648-651) for simultaneously sequencing both strands of duplex DNA uses the Sanger chain termination method in conjunction with differential fluorophore dye labelling. In one variation, forward and reverse primers are labeled with two different fluorophore dyes. DNA fragment families ending in the four different base species are collected and placed in four different electrophoretic tracks. The fragments originating from the forward direction can be distinguished from the fragments originating from the reverse direction by the fluorescence pattern from each dye. This method, however, requires the prior labelling of terminators or primers with fluorescent dyes, a laser as an excitation light-source, and a multi-color detector attached to an automatic sequencer.
Thus, although it is desirable to sequence both strands of a DNA molecule in order to accurately determine the sequence, methods for simultaneously sequencing both strands of a DNA fragment have not as yet been perfected so that double-stranded sequencing can be performed as conveniently and routinely as single-stranded sequencing.
Therefore, it is an object of this invention to provide a straightforward efficient method for simultaneously sequencing both strands of a DNA fragment. It is also an object of this invention to provide a method for detecting and correcting sequencing errors.