The present invention relates to a novel method for analyzing nucleic acid sequences based on real-time detection of DNA polymerase-catalyzed incorporation of each of the four deoxynucleoside monophosphates, supplied individually and serially as deoxynucleoside triphosphates in a microfluidic system, to a template system comprising a DNA fragment of unknown sequence and an oligonucleotide primer. Incorporation of a deoxynucleoside monophosphate (dNMP) in the primer can be detected by any of a variety of methods including but not limited to fluorescence and chemiluminescence detection. Alternatively, microcalorimetic detection of the heat generated by the incorporation of a dNMP into the extending primer using thermopile, thermistor and refractive index measurements can be used to detect extension reactions. The present invention further provides a method for monitoring and correction of sequencing errors due to misincorporation or extension failure.
The present invention provides a method for sequencing DNA that avoids electrophoretic separation of DNA fragments thus eliminating the problems associated with anomalous migration of DNA due to repeated base sequences or other self-complementary a sequences which can cause single-stranded DNA to self-hybridize into hairpin loops, and also avoids current limitations on the size of fragments that can be read. The method of the invention can be utilized to determine the nucleotide sequence of genomic or cDNA fragments, or alternatively, as a diagnostic tool for sequencing patient derived DNA samples.
Currently, two approaches are utilized for DNA sequence determination: the dideoxy chain termination method of Sanger (1977, Proc. Natl. Acad. Sci 74:5463-5674) and the chemical degradation method of Maxam (1977, Proc. Natl. Acad. Sci 74:560-564). The Sanger dideoxy chain termination method is the most widely used method and is the method upon which automated DNA sequencing machines rely. In the chain termination method, DNA polymerase enzyme is added to four separate reaction systems to make multiple copies of a template DNA strand in which the growth process has been arrested at each occurrence of an A, in one set of reactions, and a G, C, or T, respectively, in the other sets of reactions, by incorporating in each reaction system one nucleotide type lacking the 3xe2x80x2-OH on the deoxyribose at which chain extension occurs. This procedure produces a series of DNA fragments of different lengths, and it is the length of the extended DNA fragment that signals the position along the template strand at which each of four bases occur. To determine the nucleotide sequence, the DNA fragments are separated by high resolution gel electrophoresis and the order of the four bases is read from the gel.
A major research goal is to derive the DNA sequence of the entire human genome. To meet this goal the need has developed for new genomic sequencing technology that can dispense with the difficulties of gel electrophoresis, lower the costs of performing sequencing reactions, including reagent costs, increase the speed and accuracy of sequencing, and increase the length of sequence that can be read in a single step. Potential improvements in sequencing speed may be provided by a commercialized capillary gel electrophoresis technique such as that described in Marshall and Pennisis (1998, Science 280:994-995). However, a major problem common to all gel electrophoresis approaches is the occurrence of DNA sequence compressions, usually arising from secondary structures in the DNA fragment, which result in anomalous migration of certain DNA fragments through the gel.
As genomic information accumulates and the relationships between gene mutations and specific diseases are identified, there will be a growing need for diagnostic methods for identification of mutations. In contrast to the large scale methods needed for sequencing large segments of the human genome, what is needed for diagnostic methods are repetitive, low-cost, highly accurate techniques for resequencing of certain small isolated regions of the genome. In such instances, methods of sequencing based on gel electrophoresis readout become far too slow and expensive.
When considering novel DNA sequencing techniques, the possibility of reading the sequence directly, much as the cell does, rather than indirectly as in the Sanger dideoxynucleotide approach, is a preferred goal. This was the goal of early unsuccessful attempts to determine the shapes of the individual nucleotide bases with scanning probe microscopes.
Additionally, another approach for reading a nucleotide sequence directly is to treat the DNA with an exonuclease coupled with a detection scheme for identifying each nucleotide sequentially released as described in Goodwin et al., (1995, Experimental Techniques of Physics 41:279-294). However, researchers using this technology are confronted with the enormous problem of detecting and identifying single nucleotide molecules as they are digested from a single DNA strand. Simultaneous exonuclease digestion of multiple DNA strands to yield larger signals is not feasible because the enzymes rapidly get out of phase, so that nucleotides from different positions on the different strands are released together, and the sequences become unreadable. It would be highly beneficial if some means of external regulation of the exonuclease could be found so that multiple enzyme molecules could be compelled to operate in phase. However, external regulation of an enzyme that remains docked to its polymeric substrate is exceptionally difficult, if not impossible, because after each digestion the next substrate segment is immediately present at the active site. Thus, any controlling signal must be present at the active site at the start of each reaction.
A variety of methods may be used to detect the polymerase-catalyzed incorporation of deoxynucleoside monophosphates (dNMPs) into a primer at each template site. For example, the pyrophosphate released whenever DNA polymerase adds one of the four dNTPs onto a primer 3xe2x80x2 end may be detected using a chemiluminescent based detection of the pyrophosphate as described in Hyman E. D. (1988, Analytical Biochemistry 174:423-436) and U.S. Pat. No. 4,971,903. This approach has been utilized most recently in a sequencing approach referred to as xe2x80x9csequencing by incorporationxe2x80x9d as described in Ronaghi (1996, Analytical Biochem. 242:84) and Ronaghi (1998, Science 281:363-365). However, there exist two key problems associated with this approach, destruction of unincorporated nucleotides and detection of pyrophosphate. The solution to the first problem is to destroy the added, unincorporated nucleotides using a dNTP-digesting enzyme such as apyrase. The solution to the second is the detection of the pyrophosphate using ATP sulftirylase to reconvert the pyrophosphate to ATP which can be detected by a luciferase chemiluminescent reaction as described in U.S. Pat. No. 4,971,903 and Ronaghi (1998, Science 281:363-365). Deoxyadenosine xcex1- thiotriphosphate is used instead of dATP to minimize direct interaction of injected dATP with the luciferase.
Unfortunately, the requirement for multiple enzyme reactions to be completed in each cycle imposes restrictions on the speed of this approach while the read length is limited by the impossibility of completely destroying unincorporated, non-complementary, nucleotides. If some residual amount of one nucleotide remains in the reaction system at the time when a fresh aliquot of a different nucleotide is added for the next extension reaction, there exists a possibility that some fraction of the primer strands will be extended by two or more nucleotides, the added nucleotide type and the residual impurity type, if these match the template sequence, and so this fraction of the primer strands will then be out of phase with the remainder. This out of phase component produces an erroneous incorporation signal which grows larger with each cycle and ultimately makes the sequence unreadable.
A different direct sequencing approach uses dNTPs tagged at the 3xe2x80x2 OH position with four different colored fluorescent tags, one for each of the four nucleotides is described in Metzger, M. L., et al. (1994, Nucleic Acids Research 22:4259-4267). In this approach, the primer/template duplex is contacted with all four dNTPs simultaneously. Incorporation of a 3xe2x80x2 tagged NMP blocks further chain extension. The excess and unreacted dNTPs are flushed away and the incorporated nucleotide is identified by the color of the incorporated fluorescent tag. The fluorescent tag must then be removed in order for a subsequent incorporation reaction to occur. Similar to the pyrophosphate detection method, incomplete removal of a blocking fluorescent tag leaves some primer strands unextended on the next reaction cycle, and if these are subsequently unblocked in a later cycle, once again an out-of-phase signal is produced which grows larger with each cycle and ultimately limits the read length. To date, this method has so far been demonstrated to work for only a single base extension. Thus, this method is slow and is likely to be restricted to very short read lengths due to the fact that 99% efficiency in removal of the tag is required to read beyond 50 base pairs. Incomplete removal of the label results in out of phase extended DNA strands.
Accordingly, it is an object of the present invention to provide a novel method for determining the nucleotide sequence of a DNA fragment which eliminates the need for electrophoretic separation of DNA fragments. The inventive method, referred to herein as xe2x80x9creactive sequencingxe2x80x9d, is based on detection of DNA polymerase catalyzed incorporation of each of the four nucleotide types, when deoxynucleoside triphosphates (dNTP""s) are supplied individually and serially to a DNA primer/template system. The DNA primer/template system comprises a single stranded DNA fragment of unknown sequence, an oligonucleotide primer that forms a matched duplex with a short region of the single stranded DNA, and a DNA polymerase enzyme. The enzyme may either be already present in the template system, or may be supplied together with the dNTP solution.
Typically a single deoxynucleoside triphosphate (dNTP) is added to the DNA primer template system and allowed to react. As used herein deoxyribonucleotide means and includes, in addition to dGTP, dCTP, dATP, dTTP, chemically modified versions of these deoxyribonucleotides or analogs thereof Such chemically modified deoxyribonucleotides include but are not limited to those deoxyribonucleotides tagged with a fluorescent or chemiluminescent moiety. Analogs of deoxyribonucleotides that may be used include but are not limited to 7-deazapurine. The present invention additionally provides a method for improving the purity of deoxynucleotides used in the polymerase reaction.
An extension reaction will occur only when the incoming dNTP base is complementary to the next unpaired base of the DNA template beyond the 3xe2x80x2 end of the primer. While the reaction is occurring, or after a delay of sufficient duration to allow a reaction to occur, the system is tested to determine whether an additional nucleotide derived from the added dNTP has been incorporated into the DNA primer/template system. A correlation between the dNTP added to the reaction cell and detection of an incorporation signal identifies the nucleotide incorporated into the primer/template. The amplitude of the incorporation signal identifies the number of nucleotides incorporated, and thereby quantifies single base repeat lengths where these occur. By repeating this process with each of the four nucleotides individually, the sequence of the template can be directly read in the 5xe2x80x2 to 3xe2x80x2 direction one nucleotide at a time.
Detection of the polymerase mediated extension reaction and quantification of the extent of reaction can occur by a variety of different techniques, including but not limited to, microcalorimetic detection of the heat generated by the incorporation of a nucleotide into the extending duplex. Optical detection of an extension reaction by fluorescence or chemiluminescence may also be used to detect incorporation of nucleotides tagged with fluorescent or chemiluminescent entities into the extending duplex. Where the incorporated nucleotide is tagged with a fluorophore, excess unincorporated nucleotide is removed, and the template system is illuminated to stimulate fluorescence from the incorporated nucleotide. The fluorescent tag may then be cleaved and removed from the DNA template system before a subsequent incorporation cycle begins. A similar process is followed for chemiluminescent tags, with the chemiluminescent reaction being stimulated by introducing an appropriate reagent into the system, again after excess unreacted tagged dNTP has been removed; however, chemiluminescent tags are typically destroyed in the process of readout and so a separate cleavage and removal step following detection may not be required. For either type of tag, fluorescent or chemiluminescent, the tag may also be cleaved after incorporation and transported to a separate detection chamber for fluorescent or chemiluminescent detection. In this way, fluorescent quenching by adjacent fluorophore tags incorporated in a single base repeat sequence may be avoided. In addition, this may protect the DNA template system from possible radiation damage in the case of fluorescent detection or from possible chemical damage in the case of chemiluminescent detection. Alternatively the fluorescent tag may be selectively destroyed by a chemical or photochemical reaction. This process eliminates the need to cleave the tag after each readout, or to detach and transport the tag from the reaction chamber to a separate detection chamber for fluorescent detection. The present invention provides a method for selective destruction of a fluorescent tag by a photochemical reaction with diphenyliodonium ions or related species.
The present invention further provides a reactive sequencing method that utilizes a two cycle system. An exonuclease-deficient polymerase is used in the first cycle and a mixture of exonuclease-deficient and exonuclease-proficient enzymes are used in the second cycle. In the first cycle, the template-primer system together with an exonuclease-deficient polymerase will be presented sequentially with each of the four possible nucleotides. In the second cycle, after identification of the correct nucleotide, a mixture of exonuclease proficient and deficient polymerases, or a polymerase containing both types of activity will be added in a second cycle together with the correct dNTP identified in the first cycle to complete and proofread the primer extension. In this way, an exonuclease-proficient polymerase is only present in the reaction cell when the correct dNTP is present, so that exonucleolytic degradation of correctly extended strands does not occur, while degradation and correct re-extension of previously incorrectly extended strands does occur, thus achieving extremely accurate strand extension.
The present invention also provides a method for monitoring reactive sequencing reactions to detect and correct sequencing reaction errors resulting from misincorporation, i.e., incorrectly incorporating a non-complementary base, and extension failure, i.e., failure to extend a fraction of the DNA primer strands. The method is based on the ability to (i) determine the size of the trailing strand population (trailing strands are those primer strands which have undergone an extension failure at any extension prior to the current reaction step); (ii) determine the downstream sequence of the trailing strand population between the 3xe2x80x2 terminus of the trailing strands and the 3xe2x80x2 terminus of the corresponding leading strands (xe2x80x9cdownstreamxe2x80x9d refers to the template sequence beyond the current 3xe2x80x2 terminus of a primer strand; correspondingly, xe2x80x9cupstreamxe2x80x9d refers to the known template and complementary primer sequence towards the 5xe2x80x2 end of the primer strand; xe2x80x9cleading strandsxe2x80x9d are those primer strands which have not previously undergone extension failure); and (iii) predict at each extension step the signal to be expected from the extension of the trailing strands through simulation of the occurrence of an extension failure at any point upstream from the 3xe2x80x2 terminus of the leading strand. Subtraction of the predicted signal from the measured signal yields a signal due only to valid extension of the leading strand population.
In a preferred embodiment of the invention, the monitoring for reactive sequencing reaction errors is computer-aided. The ability to monitor extension failures permits determination of the point to which the trailing strands for a given template sequence have advanced and the sequence in the 1, 2 or 3 base gap between these strands and the leading strands. Knowing this information the dNTP probe cycle can be altered to selectively extend the trailing strands for a given template sequence while not extending the leading strands, thereby resynchronizing the populations.
The present invention further provides an apparatus for DNA sequencing comprising: (a) at least one chamber including a DNA primer/template system which produces a detectable signal when a DNA polymerase enzyme incorporates a deoxyribonucleotide monophosphate onto the 3xe2x80x2 end of the primer strand; (b) means for introducing into, and evacuating from, the reaction chamber at least one selected from the group consisting of buffers, electrolytes, DNA template, DNA primer, deoxyribonucleotides, and polymerase enzymes; (c) means for amplifying said signal; and (d) means for converting said signal into an electrical signal.