Analysis of DNA with currently available techniques provides a spectrum of information ranging from the confirmation that a test DNA is the same or different than a standard sequence or an isolated fragment, to the express identification and ordering of each nucleotide of the test DNA. Not only are such techniques crucial for understanding the function and control of genes and for applying many of the basic techniques of molecular biology, but they have also become increasingly important as tools in genomic analysis and a great many non-research applications, such as genetic identification, forensic analysis, genetic counseling, medical diagnostics and many others. In these latter applications, both techniques providing partial sequence information, such as fingerprinting and sequence comparisons, and techniques providing full sequence determination have been employed (Gibbs et al., Proc. Natl. Acad. Sci USA 1989; 86:1919-1923; Gyllensten et al., Proc. Natl. Acad. Sci USA 1988; 85:7652-7656; Carrano et al., Genomics 1989; 4:129-136; Caetano-Anolles et al., Mol. Gen. Genet. 1992; 235:157-165; Brenner and Livak, Proc. Natl. Acad. Sci USA 1989; 86:8902-8906; Green et al., PCR Methods and Applications 1991; 1:77-90; and Versalovic et al., Nucleic Acid Res. 1991; 19:6823-6831).
DNA sequencing methods currently available require the generation of a set of DNA fragments that are ordered by length according to nucleotide composition. The generation of this set of ordered fragments occurs in one of two ways: chemical degradation at specific nucleotides using the Maxam Gilbert method (Maxam A M and W Gilbert, Proc Natl Acad Sci USA 1977; 74:560-564) or dideoxy nucleotide incorporation using the Sanger method (Sanger F, S Nicklen, and A R Coulson, Proc Natl Acad Sci USA 1977; 74:5463-5467) so that the type and number of required steps inherently limits both the number of DNA segments that can be sequenced in parallel, and the number of operations which may be carried out in sequence. Furthermore, both methods are prone to error due to the anomalous migration of DNA fragments in denaturing gels. Time and space limitations inherent in these gel-based methods have fueled the search for alternative methods.
Several methods are under development that are designed to sequence DNA in a solid state format without a gel resolution step. The method that has generated the most interest is sequencing by hybridization. In sequencing by hybridization, the DNA sequence is read by determining the overlaps between the sequences of hybridized oligonucleotides. This strategy is possible because a long sequence can be deduced by matching up distinctive overlaps between its constituent oligomers (Strezoska Z, T Paunesku, D Radosavljevic, I Labat, R Drmanac, R Crkvenjakov, Proc Natl Acad Sci USA 1991; 88:10089-10093; Drmanac R, S Drmanac, Z Strezoska, T Paunesku, I Labat, M Zeremski, J Snoddy, W K Funkhouser, B Koop, L Hood, R Crkvenjakov, Science 1993; 260:1649-1652). This method uses hybridization conditions for oligonucleotide probes that distinguish between complete complementarity with the target sequence and a single nucleotide mismatch, and does not require resolution of fragments on polyacrylamide gels (Jacobs K A, R Rudersdorf, S D Neill, J P Dougherty, E L Brown, and E F Fritsch, Nucleic Acids Res. 1988; 16:4637-4650). Recent versions of sequencing by hybridization add a DNA ligation step in order to increase the ability of this method to discriminate between mismatches, and to decrease the length of the oligonucleotides necessary to sequence a given length of DNA (Broude N E, T Sano, C L Smith, C R Cantor, Proc. Natl. Acad. Sci. USA 1994;91:3072-3076, Drmanac R T, International Business Communications, Southborough, Mass.). Significant obstacles with this method are its inability to accurately position repetitive sequences in DNA fragments, inhibition of probe annealing by the formation of internal duplexes in the DNA fragments, and the influence of nearest neighbor nucleotides within and adjacent to an annealing domain on the melting temperature for hybridization (Riccelli P V, A S Benight, Nucleic Acids Res 1993;21:3785-3788, Williams J C, S C Case-Green, K U Mir, E M Southern. Nucleic Acids Res 1994;22:1365-1367). Furthermore, sequencing by hybridization cannot determine the length of tandem short repeats, which are associated with several human genetic diseases (Warren S T, Science 1996; 271:1374-1375). These limitations have prevented its use as a primary sequencing method.
The base addition DNA sequencing scheme uses fluorescently labeled reversible terminators of polymerase extension, with a distinct and removable fluorescent label for each of the four nucleotide analogs (Metzker M L, Raghavachari R, Richards S, Jacutin S E, Civitello A, Burgess K and R A Gibbs, Nucleic Acids Res. 1994; 22:42594267; Canard B and R S Sarfati, Gene 1994; 148:1-6). Incorporation of one of these base analogs into the growing primer strand allows identification of the incorporated nucleotide by its fluorescent label. This is followed by removal of the protecting/fluorescent group, creating a new substrate for template-directed polymerase extension. Iteration of these steps is designed to permit sequencing of a multitude of templates in a solid state format. Technical obstacles include a relatively low efficiency of extension and deprotection, and interference with primer extension caused by single-strand DNA secondary structure. A fundamental limitation to this approach is inherent in iterative methods that sequence consecutive nucleotides. That is, in order to sequence more than a handful nucleotides, each cycle of analog incorporation and deprotection must approach 100% efficiency. Even if the base addition sequencing scheme is refined so that each cycle occurs at 95% efficiency, one will have &lt;75% of the product of interest after only 6 cycles (0.95.sup.6 =0.735). This will severely limit the ability of this method to sequence anything but very short DNA sequences. Only one cycle of template-directed analog incorporation and deprotection appears to have been demonstrated so far (Metzker M L, Raghavachari R, Richards S, Jacutin S E, Civitello A, Burgess K and R A Gibbs, Nucleic Acids Res. 1994; 22:4259-4267; Canard B and R S Sarfati, Gene 1994; 148:1-6). A related earlier method, which is designed to sequence only one nucleotide per template, uses radiolabeled nucleotides or conventional non-reversible terminators attached to a variety of labels (Sokolov B P, Nucleic Acids Research 1989;18:3671; Kuppuswamy M N, J W Hoffman, C K Kasper, S G Spitzer, S L Groce, and S P Bajaj, Proc. Natl. Acad Sci. USA 1991; 88:1143-1147). Recently, this method has been called solid-phase minisequencing (Syvanen A C, E Ikonen, T Manninen, M Bengstrom, H Soderlund, P Aula, and L Peltonen, Genomics 1992; 12:590-595; Kobayashi M, Rappaport E, Blasband A, Semeraro A, Sartore M, Surrey S, Fortina P., Molecular and Cellular Probes 1995; 9:175-182) or genetic bit analysis (Nikiforov T T, R B Rendle, P Goelet, Y H Rogers, M L Kotewicz, S Anderson, G L Trainor, and M R Knapp, Nucleic Acids Research 1994; 22:4167-4175), and it has been used to verify the parentage of thoroughbred horses (Nikiforov T T, R B Rendle, P Goelet, Y H Rogers, M L Kotewicz, S Anderson, G L Trainor, and M R Knapp, Nucleic Acids Research 1994; 22:4167-4175).
An alternative method for DNA sequencing that remains in the development phase entails the use of flow cytometry to detect single molecules. In this method, one strand of a DNA molecule is synthesized using fluorescently labeled nucleotides, and the labeled DNA molecule is then digested by a processive exonuclease, with identification of the released nucleotides over real time using flow cytometry. Technical obstacles to the implementation of this method include the fidelity of incorporation of the fluorescently labeled nucleotides and turbulence created around the microbead to which the single molecule of DNA is attached (Davis L M, F R Fairfield, C A Harger, J H Jett, R A Keller, J H Hahn, L A Krakowski; B L Marrone, J C Martin, H L Nutter, R L Ratliff, E B Shera, DJ Simpson, S A Soper, Genetic Analysis, Techniques, and Applications 1991; 8:1-7). Furthermore, this method is not amenable to sequencing numerous DNA segments in parallel.
Another DNA sequencing method has recently been developed that uses class-IIS restriction endonuclease digestion and adaptor ligation to sequence at least some nucleotides offset from a terminal nucleotide. Using this method, four adjacent nucleotides have reportedly been sequenced and read following the gel resolution of DNA fragments. However, a limitation of this sequencing method is that it has built-in product losses, and requires many iterative cycles (International Application PCT/US95/03678).
Another problem exists with currently available technologies in the area of diagnostic sequencing. An ever widening array of disorders, susceptibilities to disorders, prognoses of disease conditions, and the like, have been correlated with the presence of particular DNA sequences, or the degree of variation (or mutation) in DNA sequences, at one or more genetic loci. Examples of such phenomena include human leukocyte antigen (HLA) typing, cystic fibrosis, tumor progression and heterogeneity, p53 proto-oncogene mutations, and ras proto-oncogene mutations (Gullensten et al., PCR Methods and Applications, 1:91-98 (1991); International application PCT/US92/01675; and International application PCT/CA90/00267). A difficulty in determining DNA sequences associated with such conditions to obtain diagnostic or prognostic information is the frequent presence of multiple subpopulations of DNA, e.g., allelic variants, multiple mutant forms, and the like. Distinguishing the presence and identity of multiple sequences with current sequencing technology is impractical due to the amount of DNA sequencing required.