I. The Determination of the Nucleotide Present at a Polymorphic Site
The genomes of viruses, bacteria, plants and animals naturally undergo spontaneous mutation in the course of their continuing evolution (Gusella, J. F., Ann. Rev. Biochem. 55:831-854 (1986)). Since such mutations are not immediately transmitted throughout all of the members of a species, the evolutionary process creates polymorphic alleles that co-exist in the species populations. In some instances, such co-existence is in stable or quasi-stable equilibrium. In other instances, the mutation confers a survival or evolutionary advantage to the species, and accordingly, it may eventually (i.e. over evolutionary time) be incorporated into the DNA of every member of that species.
Several classes of polymorphisms have been identified. Variable nucleotide type polymorphisms ("VNTRs"), for example arise from spontaneous tandem duplications of di- or trinucleotide repeated motifs of nucleotides (Weber, J. L., U.S. Pat. No. 5,075,217; Armour, J. A. L. et al., FEBS Lett. 307:113-115 (1992); Jones, L. et al., Eur. J. Haematol. 39:144-147 (1987); Horn, G. T. et al., PCT Application WO91/14003; Jeffreys, A. J., European Patent Application 370,719; Jeffreys, A. J., U.S. Pat. No. 5,175,082); Jeffreys, A. J. et al., Amer. J. Hum. Genet. 39:11-24 (1986); Jeffreys, A. J. et al., Nature 316:76-79 (1985); Gray, I. C. et al., Proc. R. Acad. Soc. Lond. 243:241-253 (1991); Moore, S. S. et al., Genomics 10:654-660 (1991); Jeffreys, A. J. et al., Anim. Genet. 18:1-15 (1987); Hillel, J. et al., Anim. Genet. 20:145-155 (1989); Hillel, J. et al., Genet. 124:783-789 (1990)). If such a variation alters the lengths of the fragments that are generated by restriction endonuclease cleavage, the variations are referred to as restriction fragment length polymorphisms ("RFLPs"). RFLPs have been widely used in human and animal genetic analyses (Glassberg, J., UK patent application 2135774; Skolnick, M. H. et al., Cytogen. Cell Genet. 32:58-67 (1982); Botstein, D. et al., Ann. J. Hum. Genet. 32:314-331 (1980); Fischer, S. G. et al. (PCT Application WO90/13668); Uhlen, M., PCT Application WO90/11369)).
Most polymorphisms arise from the replacement of only a single nucleotide from the initially present gene sequence. In rare cases, such a substitution can create or destroy a particular restriction site, and thus may comprise an RFLP polymorphism. In many cases, however, the substitution of a nucleotide in such single nucleotide polymorphisms cannot be determined by restriction fragment analysis. In some cases, such polymorphisms comprise mutations that are the determinative characteristic in a genetic disease. Indeed, such mutations may affect a single nucleotide in a protein-encoding gene in a manner sufficient to actually cause the disease (i.e., hemophilia, sickle-cell anemia, etc.). Despite the central importance of such polymorphisms in modern genetics, few methods have been developed that could permit the comparison of the alleles of two individuals at many such polymorphisms in parallel.
II. The Attributes of the Single Nucleotide Polymorphisms of the Present Invention and the Advantages of Their Use in Genetic Analysis
A "polymorphism" is a variation in the DNA sequence of some members of a species. A polymorphism is thus said to be "allelic," in that, due to the existence of the polymorphism, some members of a species may have the unmutated sequence (i.e. the original "allele") whereas other members may have a mutated sequence (i.e. the variant or mutant "allele"). In the simplest case, only one mutated sequence may exist, and the polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three genotypes are possible. They can be homozygous for one allele, homozygous for the other allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or the other, thus only two genotypes are possible. Diallelic polymorphisms are the preferred polymorphisms of the present invention. The occurrence of alternative mutations can give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotide(s) that comprise the mutation. The present invention is directed to a particular class of allelic polymorphisms, and to their use in genotyping a plant or animal. Such allelic polymorphisms are referred to herein as "single nucleotide polymorphisms," or "SNPs." "Single nucleotide polymorphism" are defined by their characteristic attributes. A central attribute of such a polymorphism is that it contains a polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of the polymorphism's variation (Goelet, P. and Knapp, M., U.S. patent application Ser. No. 08/145,145, herein incorporated by reference).
SNPs have several salient advantages over RFLPs and VNTRs. First, SNPs are more stable than other classes of polymorphisms. Their spontaneous mutation rate is approximately 10.sup.-9 (Kornberg, A., DNA Replication, W. H. Freeman & Co., San Francisco, 1980), approximately 1,000 times less frequent than VNTRs. Significantly, VNTR-type polymorphisms are characterized by high mutation rates.
Second, SNPs occur at greater frequency, and with greater uniformity than RFLPs and VNTRs. The characterization of VNTRs and RFLPs is highly dependent upon the method used to detect the polymorphism. In contrast, because SNPs result from sequence variation, new polymorphisms can be identified by sequencing random genomic or cDNA molecules. VNTRs and RFLPs can also be considered a subset of SNPs because variation in the region of a VNTR or RFLP can result in a single-base change in the region. SNPs can also result from deletions, point mutations and insertions. Any single base alteration, whatever the cause, can be a SNP. The greater frequency of SNPs means that they can be more readily identified than the other classes of polymorphisms. The greater uniformity of their distribution permits the identification of SNPs "nearer" to a particular trait of interest. The combined effect of these two attributes makes SNPs extremely valuable. For example, if a particular trait (e.g. predisposition to cancer) reflects a mutation at a particular locus, then any polymorphism that is linked to the particular locus can be used to predict the probability that an individual will be exhibit that trait.
SNPs can be characterized using any of a variety of methods. Such methods include the direct or indirect sequencing of the site, the use of restriction enzymes where the respective alleles of the site create or destroy a restriction site, the use of allele-specific hybridization probes, the use of antibodies that are specific for the proteins encoded by the different alleles of the polymorphism, or by other biochemical interpretation. However, no assay yet exists that is both highly accurate and easy to perform.
III. Methods of Analyzing Polymorphic Sites
A. DNA Sequencing
The most obvious method of characterizing a polymorphism entails direct DNA sequencing of the genetic locus that flanks and includes the polymorphism. Such analysis can be accomplished using either the "dideoxy-mediated chain termination method," also known as the "Sanger Method" (Sanger, F., et al., J. Molec. Biol. 94:441 (1975)) or the "chemical degradation method," "also known as the "Maxam-Gilbert method" (Maxam, A. M., et al., Proc. Natl. Acad. Sci. (U.S.A.) 74:560 (1977)). In combination with genomic sequence-specific amplification technologies, such as the polymerase chain reaction (Mullis, K. et al., Cold Spring Harbor Symp. Quant. Biol. 51:263-273 (1986); Erlich H. et al., European Patent Appln. 50,424; European Patent Appln. 84,796, European Patent Application 258,017, European Patent Appln. 237,362; Mullis, K., European Patent Appln. 201,184; Mullis, K. et al., U.S. Pat. No. 4,683,202; Erlich, H., U.S. Pat. No. 4,582,788; and Saiki, R. et al., U.S. Pat. No. 4,683,194)), may be employed to facilitate the recovery of the desired polynucleotides, direct sequencing methods are technically demanding, relatively expensive, and have low throughput rates. As a result, there has been a demand for techniques that simplify repeated and parallel analysis of SNPs.
B. Exonuclease Resistance
Mundy, C. R. (U.S. Pat. No. 4,656,127) discusses alternative methods for determining the identity of the nucleotide present at a particular polymorphic site. Mundy's methods employ a specialized exonuclease-resistant nucleotide derivative. A primer complementary to the allelic sequence immediately 3'-to the polymorphic site is permitted to hybridize to a target molecule obtained from a particular animal or human. If the polymorphic site on the target molecule contains a nucleotide that is complementary to the particular exonucleotide-resistant nucleotide derivative present, then that derivative will be incorporated by a polymerase onto the end of the hybridized primer. Such incorporation renders the primer resistant to exonuclease, and thereby permits its detection. Since the identity of the exonucleotide-resistant derivative of the sample is known, a finding that the primer has become resistant to exonucleases reveals that the nucleotide present in the polymorphic site of the target molecule was complementary to that of the nucleotide derivative used in the reaction. The Mundy method has the advantage that it does not require the determination of large amounts of extraneous sequence data. It has the disadvantages of destroying the amplified target sequences, and unmodified primer and of being extremely sensitive to the rate of polymerase incorporation of the specific exonuclease-resistant nucleotide being used.
C. Microsequencing Methods
Recently, several primer-guided nucleotide incorporation procedures for assaying polymorphic sites in DNA have been described (Komher, J. S. et al., Nucl. Acids. Res. 17:7779-7784 (1989); Sokolov, B. P., Nucl. Acids Res. 18:3671 (1990); Syvanen, A. -C., et al., Genomics 8:684-692 (1990); Kuppuswamy, M. N. et al., Proc. Natl. Acad. Sci. (U.S.A.) 88:1143-1147 (1991); Prezant, T. R. et al., Hum. Mutat. 1:159-164 (1992); Ugozzoll, L. et al., GATA 9:107-112 (1992); Nyren, P. et al., Anal. Biochem. 208:171-175 (1993)). These methods differ from Genetic Bit.TM. Analysis ("GBA.TM." discussed extensively below) in that they all rely on the incorporation of labeled deoxynucleotides to discriminate between bases at a polymorphic site. In such a format, since the signal is proportional to the number of deoxynucleotides incorporated, polymorphisms that occur in runs of the same nucleotide can result in signals that are proportional to the length of the run (Syvanen, A. -C., et al., Amer. J. Hum. Genet. 52:46-59 (1993)). Such a range of locus-specific signals could be more complex to interpret, especially for heterozygotes, compared to the simple, ternary (2:0, 1:1, or 0:2) class of signals produced by the GBA.TM. method. In addition, for some loci, incorporation of an incorrect deoxynucleotide can occur even in the presence of the correct dideoxynucleotide (Komher, J. S. et al., Nucl. Acids. Res. 17:7779-7784 (1989)). Such deoxynucleotide misincorporation events may be due to the Km of the DNA polymerase for the mispaired deoxy- substrate being comparable, in some sequence contexts, to the relatively poor Km of even a correctly base paired dideoxy- substrate (Kornberg, A., et al., In: DNA Replication, Second Edition (1992), W. H. Freeman and Company, New York; Tabor, S. et al., Proc. Natl. Acad, Sci. (U.S.A.) 86:4076-4080 (1989)). This effect would contribute to the background noise in the polymorphic site interrogation.
D. Extension in Solution Using ddNTPs
Cohen, D. et al. (French Patent 2,650,840; PCT Appln. No. WO91/02087) discuss a solution-based method for determining the identity of the nucleotide of a polymorphic site. As in the Mundy method of U.S. Pat. No. 4,656,127, a primer is employed that is complementary to allelic sequences immediately 3'-to a polymorphic site. The method determines the identity of the nucleotide of that site using labeled dideoxynucleotide derivatives, which, if complementary to the nucleotide of the polymorphic site will become incorporated onto the terminus of the primer.
The method of Cohen has the significant disadvantage of being a solution-based extension method that uses labeled dideoxynucleoside triphosphates. The target DNA template is usually prepared by a DNA amplification reaction, such as the PCR, that uses a high concentration of deoxynucleoside triphosphates, the natural substrates of DNA polymerases. These monomers will compete in the subsequent extension reaction with the dideoxynucleoside triphosphates. Therefore, following the PCR, an additional purification step is required to separate the DNA template from the unincorporated dNTPs. Because it is a solution-based method, the unincorporated dNTPs are difficult to remove and the method is not suited for high volume testing.
E. Solid-Phase Extension Using ddNTPs
An alternative method, known as Genetic Bit Analysis.TM. or GBA.TM. is described by Goelet, P. et al. (PCT Appln. No. 92/15712). In a preferred embodiment, the method of Goelet, P. et al. uses mixtures of labeled terminators and a primer that is complementary to the sequence 3' to a polymorphic site. The labeled terminator that is incorporated is thus determined by, and complementary to, the nucleotide present in the polymorphic site of the target molecule being evaluated. In contrast to the method of Cohen et al. (French Patent 2,650,840; PCT Appln. No. WO91/02087) the method of Goelet, P. et al. is preferably a heterogeneous phase assay, in which the primer or the target molecule is immobilized to a solid phase. It is thus easier to perform, and more accurate than the method discussed by Cohen.
F. Oligonucleotide Ligation Assay
Another solid phase method that uses different enzymology is the "Oligonucleotide Ligation Assay" ("OLA") (Landegren, U. et al., Science 241:1077-1080 (1988). The OLA protocol uses two oligonucleotides which are designed to be capable of hybridizing to abutting sequences of a single strand of a target. One of the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that their termini abut, and create a ligation substrate. Ligation then permits the labeled oligonucleotide to be recovered using avidin, or another biotin ligand. OLA is capable of detecting point mutations. Nickerson, D. A. et al. have described a nucleic acid detection assay that combines attributes of PCR and OLA (Nickerson, D. A. et al., Proc. Natl. Acad. Sci. (U.S.A.) 87:8923-8927 (1990). In this method, PCR is used to achieve the exponential amplification of target DNA, which is then detected using OLA. Assays, such as the OLA, require that each candidate dNTP of a polymorphism be separately examined, using a separate set of oligonucleotides for each dNTP. The major drawback of OLA is that ligation is not a highly discriminating process and non-specific signals can be a significant problem.
IV. Conclusions
As will be appreciated, most of the above-described methods require a polymerase to incorporate a nucleotide derivative onto the 3'-terminus of a primer molecule. It would be desirable to develop a more selective process for discriminating single nucleotide polymorphisms. The present invention satisfies this need by providing a ligase/polymerase-mediated method of determining the identity of the nucleotide present at a polymorphic site. The addition of a ligase to the process means that two events are required to generate a signal, extension and ligation. This grants the present invention a higher specificity and lower "noise" than methods using either extension or ligation alone. Unlike the oligonucleotide ligation assay, in the present invention, the distinguishing step of extension is mediated by polymerase and polymerases are more specific in their activity than ligases. Unlike the polymerase-based assays, this method enhances the specificity of the polymerase step by combining it with a second hybridization and a ligation step for a signal to be attached to the solid phase.