Currently, two approaches are utilized for DNA sequence determination: the dideoxy chain termination method of Sanger (1977, Proc. Natl. Acad. Sci 74:5463-5674) and the chemical degradation method of Maxam (1977, Proc. Natl. Acad. Sci 74:560-564). The Sanger dideoxy chain termination method is the most widely used method and is the method upon which automated DNA sequencing machines rely. In the chain termination method, DNA polymerase enzyme is added to four separate reaction systems to make multiple copies of a template DNA strand in which the growth process has been arrested at each occurrence of an A, in one set of reactions, and a G, C, or T, respectively, in the other sets of reactions, by incorporating in each reaction system one nucleotide type lacking the 3′-OH on the deoxyribose at which chain extension occurs. This procedure produces a series of DNA fragments of different lengths, and it is the length of the extended DNA fragment that signals the position along the template strand at which each of four bases occur. To determine the nucleotide sequence, the DNA fragments are separated by high resolution gel electrophoresis and the order of the four bases is read from the gel.
A major research goal is to derive the DNA sequence of the entire human genome. To meet this goal the need has developed for new genomic sequencing technology that can dispense with the difficulties of gel electrophoresis, lower the costs of performing sequencing reactions, including reagent costs, increase the speed and accuracy of sequencing, and increase the length of sequence that can be read in a single step. Potential improvements in sequencing speed may be provided by a commercialized capillary gel electrophoresis technique such as that described in Marshall and Pennisis (1998, Science 280:994-995). However, a major problem common to all gel electrophoresis approaches is the occurrence of DNA sequence compressions, usually arising from secondary structures in the DNA fragment, which result in anomalous migration of certain DNA fragments through the gel.
As genomic information accumulates and the relationships between gene mutations and specific diseases are identified, there will be a growing need for diagnostic methods for identification of mutations. In contrast to the large scale methods needed for sequencing large segments of the human genome, what is needed for diagnostic methods are repetitive, low-cost, highly accurate techniques for resequencing of certain small isolated regions of the genome. In such instances, methods of sequencing based on gel electrophoresis readout become far too slow and expensive.
When considering novel DNA sequencing techniques, the possibility of reading the sequence directly, much as the cell does, rather than indirectly as in the Sanger dideoxynucleotide approach, is a preferred goal. This was the goal of early unsuccessful attempts to determine the shapes of the individual nucleotide bases with scanning probe microscopes.
Additionally, another approach for reading a nucleotide sequence directly is to treat the DNA with an exonuclease coupled with a detection scheme for identifying each nucleotide sequentially released as described in Goodwin, et al., (1995, Experimental Techniques of Physics 41:279-294). However, researchers using this technology are confronted with the enormous problem of detecting and identifying single nucleotide molecules as they are digested from a single DNA strand. Simultaneous exonuclease digestion of multiple DNA strands to yield larger signals is not feasible because the enzymes rapidly get out of phase, so that nucleotides from different positions on the different strands are released together, and the sequences become unreadable. It would be highly beneficial if some means of external regulation of the exonuclease could be found so that multiple enzyme molecules could be compelled to operate in phase. However, external regulation of an enzyme that remains docked to its polymeric substrate is exceptionally difficult, if not impossible, because after each digestion the next substrate segment is immediately present at the active site. Thus, any controlling signal must be present at the active site at the start of each reaction.
A variety of methods may be used to detect the poly-merase-catalyzed incorporation of deoxynucleoside monophosphates (dNMPs) into a primer at each template site. For example, the pyrophosphate released whenever DNA polymerase adds one of the four dNTPs onto a primer 3′ end may be detected using a chemiluminescent based detection of the pyrophosphate as described in Hyman E. D. (1988, Analytical Biochemistry 174:423-436) and U.S. Pat. No. 4,971,903. This approach has been utilized most recently in a sequencing approach referred to as “sequencing by incorporation” as described in Ronaghi (1996, Analytical Biochem. 242:84) and Ronaghi (1998, Science 281:363-365). However, there exist two key problems associated with this approach, destruction of unincorporated nucleotides and detection of pyrophosphate. The solution to the first problem is to destroy the added, unincorporated nucleotides using a dNTP-digesting enzyme such as apyrase. The solution to the second is the detection of the pyrophosphate using ATP sulfurylase to reconvert the pyrophosphate to ATP which can be detected by a luciferase chemiluminescent reaction as described in U.S. Pat. No. 4,971,903 and Ronaghi (1998, Science 281:363-365). Deoxyadenosine α-thiotriphosphate is used instead of dATP to minimize direct interaction of injected dATP with the luciferase.
Unfortunately, the requirement for multiple enzyme reactions to be completed in each cycle imposes restrictions on the speed of this approach while the read length is limited by the impossibility of completely destroying un-incorporated, non-complementary, nucleotides. If some residual amount of one nucleotide remains in the reaction system at the time when a fresh aliquot of a different nucleotide is added for the next extension reaction, there exists a possibility that some fraction of the primer strands will be extended by two or more nucleotides, the added nucleotide type and the residual impurity type, if these match the template sequence, and so this fraction of the primer strands will then be out of phase with the remainder. This out of phase component produces an erroneous incorporation signal which grows larger with each cycle and ultimately makes the sequence unreadable.
A different direct sequencing approach uses dNTPs tagged at the 3′ OH position with four different colored fluorescent tags, one for each of the four nucleotides is described in Metzger, M. L., et al. (1994, Nucleic Acids Research 22:4259-4267). In this approach, the primer/template duplex is contacted with all four dNTPs simultaneously. Incorporation of a 3′ tagged NMP blocks further chain extension. The excess and unreacted dNTPs are flushed away and the incorporated nucleotide is identified by the color of the incorporated fluorescent tag. The fluorescent tag must then be removed in order for a subsequent incorporation reaction to occur. Similar to the pyrophosphate detection method, incomplete removal of a blocking fluorescent tag leaves some primer strands unextended on the next reaction cycle, and if these are subsequently unblocked in a later cycle, once again an out-of-phase signal is produced which grows larger with each cycle and ultimately limits the read length. To date, this method has so far been demonstrated to work for only a single base extension. Thus, this method is slow and is likely to be restricted to very short read lengths due to the fact that 99% efficiency in removal of the tag is required to read beyond 50 base pairs. Incomplete removal of the label results in out of phase extended DNA strands.