This application relates to DNA sequencing reactions, and in particular to improved bi-directional sequencing reaction protocols making use of thermally stable polymerase enzymes.
DNA sequencing can be performed in two distinct environments: a research environment in which each procedure is fairly unique and in which the sequence being determined is generally not known prior to completion of the sequence determination; and a diagnostic environment in which the same procedure is repeated on many samples and the sequences being determined are generally known. While the basic procedures used in these two environments can be the same, requirements for speed, cost-effectiveness and low risk of error in the diagnostic environment make many of the techniques actually employed in research too cumbersome to permit their effective utilization. This has limited the availability of sequencing-based diagnostics, and has indeed led some to question whether sequencing can ever be cost effective for routine diagnostic use.
The ideal DNA sequencing procedure for use in a diagnostic environment would have the following characteristics: (1) it would be able to utilize a DNA-containing sample which had been subjected to only minimal pretreatment to make the DNA accessible for sequencing; (2) it would require combining this sample with only a single reaction mixture, thus reducing risk of error and contamination, and increasing the ease with which the procedure can be automated; and (3) it would require a short amount of time to perform the sequence determination, thus decreasing the marginal costs in terms of equipment and labor for performing the test.
DNA sequencing, whether for research or diagnostics, is generally performed using techniques based on the "chain termination" method described by Sanger et al., Proc. Nat'l Acad. Sci. (USA) 74(12): 5463-5467 (1977). Basically, in this process, DNA to be tested is isolated, rendered single stranded, and placed into four vessels. In each vessel are the necessary components to replicate the DNA strand, i.e., a template-dependant DNA polymerase, a short primer molecule complementary to a known region of the DNA to be sequenced, and the standard deoxynucleotide triphosphates (dNTP's) commonly represented by A, C, G and T, in a buffer conducive to hybridization between the primer and the DNA to be sequenced and chain extension of the hybridized primer. In addition, each vessel contains a small quantity of one type (i.e., one species) of dideoxynucleotide triphosphate (ddNTP), e.g. dideoxyadenosine triphosphate (ddA).
In each vessel, the primer hybridizes to a specific site on the isolated DNA. The primers are then extended, one base at a time to form a new nucleic acid polymer complementary to the isolated pieces of DNA. When a dideoxynucleotide triphosphate is incorporated into the extending polymer, this terminates the polymer strand and prevents it from being further extended. Accordingly, in each vessel, a set of extended polymers of specific lengths are formed which are indicative of the positions of the nucleotide corresponding to the dideoxynucleotide in that vessel. These sets of polymers are then evaluated using gel electrophoresis to determine the sequence.
As Church and Gilbert observed, "in a mammalian cell, the DNA corresponding to any gene sequence is surrounded by DNA corresponding to some million other sequences." "The Genomic Sequencing Technique" in Medical Genetics: Past, Present and Future, Alan R. Liss, Inc., pp. 17-21, (1991). The same is true, to a greater or lesser extent, of any complex DNA sample, e.g. containing microbial genetic materials, plant genetic materials, complete cDNA libraries etc. In the past, DNA sequencing procedures have dealt with this complexity by adding steps which substantially purify the DNA of interest relative to other DNA species present in the sample. This purification has been accomplished by cloning of the DNA to be sequenced prior to sequencing, or by amplification of a selected portion of the genetic material in a sample to enrich the concentration of a region of interest relative to other DNA. For example, it is possible to amplify a selected portion of a gene using a polymerase chain reaction (PCR) as described in U.S. Pat. Nos. 4,683,194, 4,683,195 and 4,683,202, which are incorporated herein by reference. This process involves the use of pairs of primers, one for each strand of the duplex DNA, that will hybridize at a site located near a region of interest in a gene. Chain extension polymerization (without a chain terminating nucleotide) is then carried out in repetitive cycles to increase the number of copies of the region of interest many times. The amplified polynucleotides are then separated from the reaction mixture and used as the starting sample for the sequencing reaction. Gelfand et al. have described a thermostable enzyme, "Taq polymerase," derived from the organism Thermus aquaticus, which is useful in this amplification process. (See U.S. Pat. Nos. 4,889,818; 5,352,600 and 5,079,352 which are incorporated herein by reference) Taq polymerase has also been disclosed as useful in sequencing DNA when certain special conditions are met. U.S. Pat. No. 5,075,216, incorporated herein by reference.
Improvements to the original technique described by Sanger et al. have included improvements to the enzyme used to extend the primer chain. For example, Tabor et al. have described enzymes such as T7 DNA polymerase which have increased processivity, and increased levels of incorporation of dideoxynucleotides. (See U.S. Pat. No. 4,795,699 and EP-A-0 386 857, which are incorporated herein by reference). More recently, Reeve et al. have described a thermostable enzyme preparation, called THERMO SEQUENASE.TM., with improved qualities for DNA sequencing. Nature 376: 796-797 (1995); EP-A-0 655 506, which is incorporated herein by reference. For sequencing, the THERMO SEQUENASE.TM. product is used with an amplified DNA sample containing 0.5-2 .mu.g of single stranded DNA (or 0.5 to 5 .mu.g of double stranded DNA) into four aliquots, and combining each aliquot with the THERMO SEQUENASE.TM. enzyme preparation, one dideoxynucleotide termination mixture containing one ddNTP and all four dNTP's; and one dye-labeled primer which will hybridize to the DNA to be sequenced. The mixture is placed in a thermocycler and run for 20-30 cycles of annealing, extension and denaturation to produce measurable amounts of dye-labeled extension products of varying lengths which are then evaluated by gel electrophoresis. EP-A-0 655 506 further asserts that THERMO SEQUENASE.TM. and similar enzymes can be used for amplification reactions.
Other improvements on the Sanger process have involved the use of fluorescent labels rather than radiolabels to permit real time detection. See, U.S. Pat. No. 5,171,534 of Smith et al. and U.S. Pat No. 4,729,947 of Middendorf et al. which are incorporated herein by reference. Fluorescent labels has also ben used to provide simultaneous sequencing of both strands of as DNA molecule. Wiemann et al., "Simultaneous On-Line DNA Sequencing on Both Stands with Two Fluorescent Dyes," Anal. Biochem 224: 117-121 (1995).
Notwithstanding the basic desirability of simplifying the sequencing reaction procedures to minimize risk of error and contamination, efforts to combine the amplification reaction and the sequencing reaction into a single step have been limited. One such technique has been called "cycle sequencing" or "linear amplification sequencing." In this technique, a thermostable polymerase and dideoxynucleotide triphosphates are used in a thermocycled reaction to produce sequencing fragments. The reaction differs from PCR amplification in that only one primer is used, so there is only a linear increase in the amount of DNA with each cycle. Kretz et al., in PCR Methods and Applications, Cold Spring Harbor Press, pp S107-112 (1994).
Ruano and Kidd, Proc. Nat'l. Acad. Sci. (USA) 88: 2815-2819 (1991) and U.S. Pat. No. 5,427,911, which are incorporated herein by reference, describe a process which they call "coupled amplification and sequencing" (CAS) for sequencing of DNA. In this process, a sample is treated in a first reaction stage with two primers and amplified for a number of cycles to achieve 10,000 to 100,000-fold amplification. A ddNTP is then added during the exponential phase of the amplification reaction, and the reaction is processed for additional thermal cycles to produce chain-terminated sequencing fragments. Sequencing of each strand is done separately.
It is an object of the present invention to provide an improved method for bi-directional sequencing of DNA samples which is well-suited for use in the diagnostic environment and for automation.
It is a further object of the invention to provide a method for bi-directional sequencing of DNA which utilizes a DNA-containing sample which has been subjected to only minimal pretreatment to make the DNA accessible for sequencing.
It is still a further object of the invention to provide a method for bi-directional sequencing of DNA which requires combining a complex DNA-containing sample with only a single reaction mixture, thus reducing risk of error and contamination, and increasing the ease with which the procedure can be automated.