PCR techniques enable the amplification of DNA which lies between two regions of known sequence (K. B. Mullis et al., U.S. Pat. Nos. 4,683,202 and 4,683,195). Oligonucleotides complementary to these known sequences at both ends serve as “primers” in the PCR procedure. Double stranded target DNA is first melted to separate the DNA strands, and then oligonucleotide (oligo) primers complementary to the ends of a target segment whose amplification is desired are annealed to the template DNA. The oligos serve as primers for the synthesis of new complementary DNA strands, using a DNA polymerase enzyme and a process known as primer extension. The orientation of the primers with respect to one another is such that the 5′ to 3′ extension product from each primer contains, when extended far enough, a segment of sequence that is complementary to the other oligo. Thus, each newly synthesized DNA strand becomes a template for synthesis of another DNA strand beginning with the other oligo as primer. Repeated cycles of melting, annealing of oligo primers, and primer extension lead to a (near) doubling, with each cycle, of DNA strands containing the sequence of the template beginning with the sequence of one oligo and ending with the sequence of the other oligo.
The key requirement for this exponential increase of template DNA is that the two oligo primers are complementary to the ends of the sequence desired to be amplified, and are oriented such that their 3′ extension products proceed toward each other. If the sequence at both ends of the segment to be amplified is not known, complementary oligos cannot be made and standard PCR cannot be performed. Thus, this procedure is impractical for contiguously sequencing a long DNA strand, such as a chromosome. Accordingly, an object of the present invention is to overcome the need for sequence information at both ends of the segment to be amplified, i.e. to provide a method that allows PCR to be performed when sequence is known for only a single region, and to provide a method for the contiguous sequencing of a very long DNA without the need for subcloning of the DNA.
DNA sequencing is a technique by which the four DNA nucleotides (characters) in a linear DNA sequence are ordered by chemical and biochemical means. There are two techniques: 1) the chemical method of Maxam and Gilbert (A. M. Maxam, and W. Gilbert, P.N.A.S. USA, 74:560-564, 1977), and the enzymatic method of Sanger and colleagues (F. Sanger, S. Nicklen, and A. R. Coulson, 74:5463-5467, 1977). In the chemical method, the DNA strand is isotropically labeled on one end, broken down into smaller fragments at sequence locations ending with a particular nucleotide (A, T, C, or G) by chemical means, and the fragments ordered based on this information. The four nucleotide-specific reaction products are resolved on a polyacrylamide gel, and the autoradiographic image of the gel is examined to infer the DNA sequence.
In the enzymatic method, an oligonucleotide primer is annealed to a suitable single or denatured double stranded DNA template; the primer is extended with DNA polymerase in four separate reactions, each containing one a-labeled dNTP or dideoxynucleoside-5′-triphosphate (ddNTP) (alternatively a labeled primer can be used), a mixture of unlabeled dNTPs, and one chain-terminating ddNTP; resolving the four sets of reaction products on a high resolution polyacrylamide-urea gel; and producing an autoradiographic image of the gel that can be examined to infer the DNA sequence. Alternatively, fluorescently labeled primers or nucleotides can be used to identify the reaction products. Known dideoxy sequencing methods utilize a DNA polymerase such as the Klenow fragment of E. coli DNA polymerase, reverse transcriptase, a modified T7 DNA polymerase, or the Taq polymerase.
The PCR amplification procedure has been used to sequence DNA being amplified (e.g. using AmpliTaq ™DNA polymerase Cycle Sequencing (Perkin Elmer Cetus Corporation)). By this procedure, DNA can be first amplified and then sequenced using the two conventional DNA sequencing techniques. A modification of this procedure is disclosed by Bevan et al., PCR Meth. App. 4:222 (1992)). The PCR method also enables the reduction of non- specific binding of the printers to the template DNA because the enzymes used in these protocols function at high-temperatures, and thus allow “stringent” reaction conditions to be used to improve sequencing. By this procedure, DNA can be first amplified and then sequenced using the two conventional DNA sequencing techniques. A modification of this procedure is disclosed by Bevan et al., PCR Meth. App. 4:222 (1992)). The PCR method also enables the reduction of non-specific binding of the primers to the template DNA because the enzymes used in these protocols function at high-temperatures, and thus allow “stringent” reaction conditions to be used to improve sequencing.
In the currently existing methods for sequencing DNA of millions of nucleotides, the DNA is fragmented into smaller, overlapping fragments, and sub-cloned to produce numerous clones containing overlapping DNA sequences. These clones are sequenced randomly (sometimes known as the “shot-gun sequencing method”) and the sequences are assembled by “overlap sequence-matching” to produce the contiguous sequence. In this shot-gun sequencing method, approximately ten times more sequencing than the length of the DNA being sequenced is required to assemble the contiguous sequence. In these sequencing methods, the linear order of the DNA clones has to be first determined by “physical mapping” of the clones.
Among the currently known contiguous DNA sequencing methods is a procedure called the “primer-walking” or “chromosome walking” method, which uses Sanger's DNA polymerase enzymatic sequencing procedure. In this method, however, the DNA copying always has to occur from the template DNA during DNA sequencing (rather than from the target DNA amplified in the first rounds from the original input template DNA functioning as the template DNA for subsequent cycles of amplification, as in PCR sequencing). Thus, in the “primer-walking” or “chromosome walking” method, after a certain number of cycles of amplification, the DNA sequencing reaction is initiated by adding a sequencing “cocktail”. As a result, the “primer-walking” or “chromosome walking” method requires a larger amount of template DNA than does the PCR sequencing method. Also, when a very long DNA is being sequenced, the DNA has a tendency to re-anneal back to duplex DNA, so that the sequencing gel pattern obtained by the “primer-walking” method may not be as clean as in a PCR procedure. This disadvantage may limit the length of the DNA that can be contiguously sequenced by this method without breaking the DNA.
U.S. Pat. No. 5,994,058 discloses a method for sequencing long nucleotide molecules using the PCR procedure wherein the sequence of only one primer needs to be known. In this method a first primer is used that is fully complementary to a primer binding site on the target nucleic acid sequence and the second primer consists of 12-16 nucleotides of which 1-10 of the nucleotides anywhere within the primer are of fixed sequence while the remaining nucleotides of the second primer are of random sequence. By generating a large enough number of such second primers of various sequence, one will have a nucleotide sequence fully complementary to a second primer binding site. Using this technique, a long genomic DNA, such as a chromosome, can be contiguously sequenced without the need for subcloning it into smaller fragments. However, in this method, a very large number of second primers must be generated, only a few of which will prove useful.
A combination of “shotgun sequencing” and “chromosome walking” is also currently used to enable the isolation of unknown DNA through use of the adjacent DNA's known sequence. These techniques are routinely applied during analysis of genomic DNA. For example, high-throughput “shotgun sequencing” invariably reaches a stage at which continued sequencing of random clones becomes an inefficient method to close gaps in assembled sequence. When these gaps are too large for standard PCR amplification, “chromosome walking” may be used to systematically obtain and sequence DNA spanning the gaps.
Vaccinia DNA topoisomerase has been used in procedures involving the joining of DNA fragments. Vaccinia DNA topoisomerase, a 314 aa virus-encoded eukaryotic type I topoisomerase (I), binds to duplex DNA and cleaves the phosphodiester backbone of one strand (Shuman, S., and Moss, B. (1987) Proc. Natl. Acad. Sci. USA 84: 7478-7482). The enzyme exhibits a high level of sequence specificity, akin to that of a restriction endonuclease. Cleavage occurs at a consensus pentapyrimidine element 5′-(C/T)CCTT-3′ in the scissile strand (Cheng, S., et al. (1994) Proc. Natl. Acad. Sci. USA 91: 5695-5699; Clark, J. M. (1988) Nucleic Acids Res. 16: 9677-9686; and Morham, S. G., and Shuman, S. (1992) J. Biol. Chem. 267: 15984-15992). In the cleavage reaction, bond energy is conserved via the formation of a covalent adduct between the 3′ phosphate of the incised strand and a tyrosyl residue (Tyr-274) of the protein. Vaccinia topoisomerase can religate the covalently held strand across the same bond originally cleaved (as occurs during DNA relaxation) or it can religate to a heterologous acceptor DNA and thereby create a recombinant molecule.
The repertoire of DNA joining reactions catalyzed by Vaccinia topoisomerase has been studied in detail by Dr. Stewart Shuman using synthetic duplex DNA substrates containing a single CCCTT cleavage site. When the substrate is configured such that the scissile bond is situated near (e.g., within 10 bp of) the 3′ end of a DNA duplex, cleavage is accompanied by spontaneous dissociation of the downstream portion of the cleaved strand (Shuman, S., J. Biol. Chem. 267:8620-8627, 1992a; Shuman, S., J. Biol. Chem. 267:16755-16758, 1992b). The resulting topoisomerase-DNA complex, containing a 5′ single-stranded tail, can religate to an acceptor DNA if the acceptor molecule has a 5′ hydroxyl tail complementary to that of the activated donor complex. Sticky end-ligation by Vaccinia topoisomerase has also been demonstrated by Shuman, using plasmid DNA acceptors with four base overhangs created by restriction endonuclease digestion.
PCR fragments are naturally good surrogate substrates for the topoisomerase I religation step because they generally have 5′ hydroxyl residues from the primers used for the amplification reaction. The 5′ hydroxyl is the substrate for the religation reaction. U.S. Pat. No. 5,766,891 discloses a method utilizing this feature of topoisomerase religation to ligate duplex DNAs employing the modified tagged Vaccinia topoisomerase. In this method of ligation the donor duplex DNA substrate is a bivalent donor duplex DNA substrate, that is, it contains two topoisomerase cleavage sites. One embodiment comprises cleaving a donor duplex DNA substrate containing sequence-specific topoisomerase cleavage sites by incubating the donor duplex DNA substrate with a sequence-specific topoisomerase to form a topoisomerase-bound donor duplex DNA strand and incubating the topoisomerase-bound donor duplex DNA strand with a 5′ hydroxyl-terminated compatible acceptor DNA, resulting in the ligation of the topoisomerase-bound donor duplex DNA strand to the DNA acceptor strand.
Despite these advancements in the art, there is a need for new and better methods for isolating and sequencing long stretches of nucleic acid containing segments of unknown sequence. In particular, there is a need in the art for an efficient method for systematically obtaining and sequencing DNA spanning the gaps in a sequence assembled by high-throughput “shotgun sequencing.”