This application relates to DNA sequencing reactions, and in particular to improved sequencing reaction protocols making use of thermally stable polymerase enzymes having reduced error rates.
DNA sequencing can be performed in two distinct environments: a research environment in which each procedure is fairly unique and in which the sequence being determined is generally not known prior to completion of the sequence determination; and a diagnostic environment in which the same procedure is repeated on many samples and the sequences being determined are generally known. While the basic procedures used in these two environments can be the same, requirements for speed, cost-effectiveness and low risk of error in the diagnostic environment make many of the techniques actually employed too cumbersome to permit their effective utilization. This has limited the availability of sequencing-based diagnostics, and has indeed led some to question whether sequencing can ever be cost effective for routine diagnostic use.
The ideal DNA sequencing procedure for use in a diagnostic environment would have the following characteristics: (1) it would be able to utilize a DNA-containing sample which had been subjected to only minimal pretreatment to make the DNA accessible for sequencing; (2) it would require combining this sample with only a single reaction mixture, thus reducing risk of error and contamination, and increasing the ease with which the procedure can be automated; and (3) it would require a short amount of time to perform the sequence determination, thus decreasing the marginal costs in terms of equipment and labor for performing the test.
DNA sequencing, whether for research or diagnostics, is generally performed using techniques based on the xe2x80x9cchain terminationxe2x80x9d method described by Sanger et al., Proc. Nat""l Acad. Sci. (USA) 74(12):5463-5467 (1977). Basically, in this process, DNA to be tested is isolated, rendered single stranded, and placed into four vessels. In each vessel are the necessary components to replicate the DNA strand, i.e., a template-dependant DNA polymerase, a short primer molecule complementary to a known region of the DNA to be sequenced, and the standard deoxynucleotide triphosphates (dNTP""s) commonly represented by A, C, G and T, in a buffer conducive to hybridization between the primer and the DNA to be sequenced and chain extension of the hybridized primer. In addition, each vessel contains a small quantity of one type (i.e., one species) of dideoxynucleotide triphosphate (ddNTP), e.g. dideoxyadenosine triphosphate (ddA).
In each vessel, the primer hybridizes to a specific site on the isolated DNA. The primers are then extended, one base at a time to form a new nucleic acid polymer complementary to the isolated pieces of DNA. When a dideoxynucleotide triphosphate is incorporated into the extending polymer, this terminates the polymer strand and prevents it from being further extended. Accordingly, in each vessel, a set of extended polymers of specific lengths are formed which are indicative of the positions of the nucleotide corresponding to the dideoxynucleotide in that vessel. These sets of polymers are then evaluated using gel electrophoresis to determine the sequence.
As Church and Gilbert observed, xe2x80x9cin a mammalian cell, the DNA corresponding to any gene sequence is surrounded by DNA corresponding to some million other sequences.xe2x80x9d xe2x80x9cThe Genomic Sequencing Techniquexe2x80x9d in Medical Genetics: Past, Present and Future, Alan R. Liss, Inc., pp. 17-21, (1991). The same is true, to a greater or lesser extent, of any complex DNA sample, e.g. containing microbial genetic materials, plant genetic materials, complete cDNA libraries etc. In the past, DNA sequencing procedures have dealt with this complexity by adding steps which substantially purify the DNA of interest relative to other DNA species present in the sample. This purification has been accomplished by cloning of the DNA to be sequenced prior to sequencing, or by amplification of a selected portion of the genetic material in a sample to enrich the concentration of a region of interest relative to other DNA. For example, it is possible to amplify a selected portion of a gene using a polymerase chain reaction (PCR) as described in U.S. Pat. Nos. 4,683,194, 4,683,195 and 4,683,202, which are incorporated herein by reference. This process involves the use of pairs of primers, one for each strand of the duplex DNA, that will hybridize at a site located near a region of interest in a gene. Chain extension polymerization (without a chain terminating nucleotide) is then carried out in repetitive cycles to increase the number of copies of the region of interest many times. The amplified polynucleotides are then separated from the reaction mixture and used as the starting sample for the sequencing reaction. Gelfand et al. have described a thermostable enzyme, xe2x80x9cTaq polymerase,xe2x80x9d derived from the organism Thermus aquaticus, which is useful in this amplification process. (See U.S. Pat. Nos. 4,889,818; 5,352,600 and 5,079,352 which are incorporated herein by reference) Taq polymerase has also been disclosed as useful in sequencing DNA when certain special conditions are met. U.S. Pat. No. 5,075,216, incorporated herein by reference.
Improvements to the original technique described by Sanger et al. have included improvements to the enzyme used to extend the primer chain. For example, Tabor et al. have described enzymes such as T7 DNA polymerase which have increased processivity, and increased levels of incorporation of dideoxynucleotides. (See U.S. Pat. No. 4,795,699 and EP-A-0 386 857, which are incorporated herein by reference). More recently, Reeve et al. have described a thermostable enzyme preparation, called Thermo Sequenase(trademark), with improved qualities for DNA sequencing. Nature 376: 796-797 (1995); EP-A-0 655 506, which is incorporated herein by reference. For sequencing, the Thermo Sequenase(trademark) product is used with an amplified DNA sample containing 0.5-2 xcexcg of single stranded DNA (or 0.5 to 5 xcexcg of double stranded DNA) into four aliquots, and combining each aliquot with the Thermo Sequenase(trademark) enzyme preparation, one dideoxynucleotide termination mixture containing one ddNTP and all four dNTP""s; and one dye-labeled primer which will hybridize to the DNA to be sequenced. The mixture is placed in a thermocycler and run for 20-30 cycles of annealing, extension and denaturation to produce measurable amounts of dye-labeled extension products of varying lengths which are then evaluated by gel electrophoresis. EP-A-0 655 506 further asserts that Thermo Sequenase(trademark) and similar enzymes can be used for amplification reactions.
Notwithstanding the observations in the art that enzymes useful for amplification can also be used for sequencing, and vice versa, efforts to combine the amplification reaction and the sequencing reaction into a single step have been limited. Ruano and Kidd, Proc. Nat""L Acad. Sci. (USA) 88: 2815-2819 (1991) and U.S. Pat. No. 5,427,911, which are incorporated herein by reference, describe a process which they call xe2x80x9ccoupled amplification and sequencingxe2x80x9d (CAS) for sequencing of DNA. In this process, a sample is treated in a first reaction stage with two primers and amplified for a number of cycles to achieve 10,000 to 100,000-fold amplification. A ddNTP is then added during the exponential phase of the amplification reaction, and the reaction is processed for additional thermal cycles to produce chain-terminated sequencing fragments. The CAS process does not achieve the criteria set forth above for an ideal diagnostic assay because it requires an intermediate addition of reagents (the ddNTP reagents). This introduces and opportunity for error or contamination and increases the complexity of any apparatus which would be used for automation.
The problem of errors occurring during amplification has been addressed in one approach through the incorporation into the extending polymers of unusual nucleotides (for example dUTP) which are subject to enzymatic attack (for example with uracil-N-glycosylase) and degradation. See U.S. Pat. No. 5,418,149, which is incorporated herein by reference. Such molecules can be in utilized in most of the same ways that conventional amplification are used, but can be eliminated as contaminants from other reactions by incorporation of a pre-treatment step utilizing an appropriate enzyme to degrade the modified nucleic acid polymers.
It is an object of the present invention to provide a method for sequencing of high-complexity DNA samples which is well-suited for use in the diagnostic environment and for automation and which provides a means for minimizing errors caused by contamination and nucleic acid polymer carryover.
It is a further object of the invention to provide a method for sequencing of DNA which utilizes a DNA-containing sample which had been subjected to only minimal pretreatment to make the DNA accessible for sequencing and which provides a means for minimizing errors caused by contamination and nucleic acid polymer carryover.
It is still a further object of the invention to provide a method for sequencing of DNA which requires combining a complex DNA-containing sample with only a single reaction mixture, thus reducing risk of error and contamination, and increasing the ease with which the procedure can be automated.
The present invention provides a method for sequencing a region of interest in a DNA sample in which a single set of reagents is added to a minimally-treated sample to produce useful sequencing results. The invention is based on the surprising observation and discovery that the addition of a reaction mixture containing the thermostable polymerase Thermo Sequenase(trademark), two primers which bind to complementary strands of a target DNA molecule at sites flanking the region of interest, a mixture of nucleotide triphosphates (A, C, G and T) and one dideoxynucleotide triphosphate to a DNA sample which contains target and non-target DNA in substantially natural abundance, including highly complex DNA samples such as genomic human DNA, and the processing of the combination through multiple cycles of annealing, extension and denaturation results in the production of a mixture which can be loaded directly onto a gel for sequence analysis of the region of interest. The reaction mixture also includes an unconventional nucleotide and an appropriate enzyme for degradation of nucleic acid polymers containing the unconventional nucleotide.
One aspect of the present invention is a method for sequencing a selected region of a target nucleic acid polymer comprising the steps of
(a) combining a natural abundance sample containing the target nucleic acid polymer with a reaction mixture comprising three types of deoxynucleotide triphosphates, an unconventional nucleotide triphosphate corresponding to the fourth type of base, a dideoxynucleotide triphosphate, first and second primers, an enzyme which degrades nucleic acid polymers incorporating the unconventional nucleotide, and a thermally stable polymerase enzyme which incorporates dideoxynucleotides into an extending nucleic acid polymer at a rate which is no less than about 0.4 times the rate of incorporation of deoxynucleotides to form a reaction mixture, said first and second primers binding to the sense and antisense strands, respectively, of the target nucleic acid polymer at locations flanking the selected region;
(b) exposing the reaction mixture to an initial stage in which the enzyme that degrades nucleic acid polymers incorporating the unconventional nucleotide is active for a period of time sufficient to degrade nucleic acid polymers containing the unconventional nucleic acid which may be present in the sample;
(c) exposing the reaction mixture to a plurality of temperature cycles each of which includes at least a high temperature denaturation phase and a lower temperature extension phase to produce a product mixture comprising sequencing fragments which are terminated by incorporation of the dideoxynucleotide; and
(d) evaluating the product mixture to determine the lengths of the sequencing fragments produced.