This invention relates to the field of nucleic acid sequence analysis. The analysis of nucleic acid sequences can be used, e.g., to determine the presence or absence of a particular genetic element. Variant genetic elements of a nucleic acid sequence usually exist. Exemplary variant genetic elements may include, but are by no means limited to, genetic mutations or polymorphisms such as single nucleotide polymorphisms (“SNP's”), base deletions, base insertions, and heterozygous as well as homozygous polymorphisms. Accordingly many techniques have been developed to compare homologous segments of nucleic acid sequence to determine if the segments are identical or if they differ at one or more nucleotides. Practical applications of these techniques include genetic disease diagnoses, infectious disease diagnoses, forensic techniques, paternity determinations, and genome mapping.
In general, the detection of nucleic acids in a sample and of the subtypes thereof depends on the technique of specific nucleic acid hybridization in which the oligonucleotide probe is annealed under conditions of high stringency to nucleic acids in the sample, and the successfully annealed probes are subsequently detected (see, e.g., Spiegelman, S., 1964, Scientific American 210:48).
The most definitive method for comparing DNA segments is to determine the complete nucleotide sequence of each segment. Examples of how sequencing has been used to study mutations in human genes are included in the publications of Engelke et al. (1988, Proc. Natl. Acad. Sci. U.S.A. 85:544-548) and Wong et al. (1987, Nature 330:384-386). The most commonly used methods of nucleic acid sequencing include the dideoxy-mediated chain termination method, also known as the “Sanger Method” (Sanger, F. et al., 1975, J. Molec. Biol. 94:441; Porbe, J. et al., 1987, Science 238:336-340) and the chemical degradation or “Maxam-Gilbert” method (Maxam, A. M. et al., 1977, Proc. Natl. Acad. Sci. U.S.A. 74:560).
Both the Sanger and Maxim-Gilbert methods comprise a series of four chemical reactions, one for each of the nucleotide bases, e.g., A, C, G, and T for DNA, consisting of either primer extension (Sanger) or partial cleavage (Maxim-Gilbert) reactions. The reactions produce four sets of nested nucleic acid molecules whose lengths are determined by the location of a particular base along the length of the nucleic acid molecule being sequenced. The nested reaction products are then resolved by electrophoretic gels.
The separation and analysis of reaction products on electrophoretic gels is a laborious and time consuming step. Accordingly, alternative methods have been developed to sequence nucleic acid molecules. For example, there is considerable interest in developing methods of de novo sequencing using solid phase arrays (see, e.g., Chetverin, A. B. et al., 1994, Bio/Technology 12:1093-1099; Macevicz, U.S. Pat. No. 5,002,867; Beattie, W. G. et al., 1995, Molec. Biotech. 4:213-225; Drmanac, R. T., EP 797683; Gruber, L. S., EP 787183; each of which is incorporated herein by reference in its entirety). These methods consist primarily of hybridization of template nucleic acids to arrayed primers containing combinatorial sequences which hybridize to complementary sequences on the template strand. The methods combine the capture of the template, by formation of stable duplex structures, with sequence discrimination due to instability of mismatches between the template and the primer.
Such methods must typically employ arrays of primers at least twelve bases in length which contain approximately 16 million sequence combinations. Such arrays are very complex and time consuming both to construct and to analyze. Thus, at the present time it is not practical to use extensive sequencing methods, such as the methods described above, to compare more than just a few DNA segments because the effort required to determine, interpret, and compare complete sequence information is time-consuming.
Restriction fragment length polymorphism (“RFLP”) mapping is another commonly used screen for DNA polymorphisms arising from DNA sequence variation. RFLP consists of digesting DNA with restriction endonucleases and analyzing the resulting fragment by means of Southern blots, as described by Botstein et al., 1980 (Am. J. Hum. Genet. 32:314-331) and White et al. (1988, Sci. Am. 258:40-48). Mutations that affect the recognition of sequence of the endonuclease will preclude enzymatic cleavage at that site, thereby altering the cleavage pattern of the DNA. DNAs are compared by looking for differences in restriction fragment lengths. However, a major problem with RFLP mapping is its inability to detect mutations that do not affect cleavage with a restriction endonuclease. Thus, many mutations are missed with this method. Further, the methods used to detect restriction fragment length polymorphisms are very labor intensive, particularly the techniques involved with Southern blot analysis.
Alternative, simpler methods have been developed which use solid phase arrays to analyze single nucleotide polymorphisms (SNP's). These techniques rely on the fact that analysis of SNP's, which constitute sites of variation flanked by regions of invariant sequence, requires no more than the determination of the identity of the single nucleotide present at the site of variation.
For example, several primer-guided nucleotide incorporation procedures for assaying polymorphic sites in DNA have been described (e.g., Komher, J. S. et al., 1989, Nucl. Acids Res. 177779-7784; Sokolov, B. P., 1990, Nucl. Acids Res. 18:3671; Syvanen, A.-C. et al., 1990, Genomics 8:684-692; Kuppuswamy, M. N. et al., 1991, Proc. Natl. Acad. Sci. U.S.A. 88:1143-1147; Prezant, T. R. et al., 1992, Hum. Mutat. 1:159-164; Ugozzoli, L. et al., 1992, GATA 9:107-112; Nyren, P. et al., 1993, Anal. Biochem. 208:171-175; and Wallace WO89/10414). Each of these methods relies on the incorporation of labeled deoxynucleotides to discriminate between bases at a polymorphic site. An alternative microsequencing method, the Genetic Bit Analysis (GBA™) method has been disclosed by Goelet, P. et al. (WO 92/15712) which avoids many of the problems in the above identified microsequencing assays. In GBA™, the nucleotide sequence information surrounding a predetermined site of interrogation is used to design an oligonucleotide primer that is complementary to the region immediately adjacent to, but not including, the predetermined site. The target DNA template is selected from the biological sample and hybridized to the interrogating primer. This primer is extended by a single labeled dideoxynucleotide using DNA polymerase in the presence of at least two, and most preferably all four chain terminating nucleoside triphosphate precursors.
Several variations of the GBA method have been developed, as well as other microsequencing methods (see, e.g., Mundy, U.S. Pat. No. 4,656,127; Vary and Diamond, U.S. Pat. No. 4,851,331; Cohen, D. et al., PCT Application No. WO91/02087; Chee, M. et al., WO95/11995; Landegren, U. et al., 1988, Science 241:1077-1080; Nicerson, D. A. et al., 1990, Proc. Natl. Acad. Sci. U.S.A. 87:8923-8927; Pastinen, T. et al. (1997, Genome Res. 7:606-614; Pastinen, T. et al., 1996, Clin. Chem. 42:1391-1397; Jalanko, A. et al. (1992, Clin. Chem. 38:39-43; Shumaker, J. M. et al., 1996, Hum. Mutation 7:346-354; Caskey, C. et al., WO 95/00669). Although they are simpler to perform and analyze than de novo sequencing, such microsequencing methods require primers that hybridize to the target nucleic acid molecule at a site immediately adjacent to a polymorphism (or a site suspected of being next to a polymorphism). Hence, such techniques require prior knowledge of a “wild type” nucleic acid sequence. Further, the techniques are limited to identifying a specific mutation or polymorphisms, typically a SNP, at a specific location in a specific nucleic acid sequence. Finally, such techniques also typically require multiple interrogations per target base.