The genomes of all organisms undergo spontaneous mutation in the course of their continuing evolution, generating variant forms of progenitor nucleic acid sequences (Gusella, (1986) Ann. Rev. Biochem. 55:831-854). The variant form can confer an evolutionary advantage or disadvantage relative to a progenitor form, or can be neutral. In some instances, a variant form confers a lethal disadvantage and is not transmitted to subsequent generations of the organism. In other instances, a variant form confers an evolutionary advantage to the species and is eventually incorporated into the DNA of many or most members of the species and effectively becomes the progenitor form. In many instances, both the progenitor and variant form(s) survive and co-exist in a species population. The coexistence of multiple forms of a sequence gives rise to polymorphisms.
Several different types of polymorphisms have been reported. For example, a restriction fragment length polymorphism (RFLP) is a variation in DNA sequence that alters the length of a restriction fragment (Botstein et al., (1980) Am. J. Hum. Genet. 32:314-331). The restriction fragment length polymorphism can create or delete a restriction site, thus changing the length of the restriction fragment. RFLPs have been used in human and animal genetic analyses (see, e.g., PCT Publications WO 90/13668 and WO 90/11369; Donis-Keller, (1987) Cell 51:319-337; Lander et al., (1989) Genetics 121:85-99). When a heritable trait can be linked to a particular RFLP, the presence of the RFLP in an individual can be used to predict the likelihood that the animal will also exhibit the trait.
Other polymorphisms take the form of short tandem repeats (STRs) that include tandem di-, tri- and tetra-nucleotide repeated motifs. These tandem repeats are also referred to as “variable number tandem repeat” (VNTR) polymorphisms. VNTRs have been used in identity and paternity analysis (see, e.g., Annour et al., (1992) FEBS Lett. 307:113-115; U.S. Pat. No. 5,075,217; PCT Publication WO 91/14003; EP 370,719), and in a large number of genetic mapping studies.
Yet other polymorphisms take the form of single nucleotide variations between individuals of the same species. Such polymorphisms are far more frequent than RFLPs, STRs and VNTRs. Some single nucleotide polymorphisms (SNP) occur in protein-coding nucleic acid sequences, referred to as coding sequence SNPs (cSNPs). In these cases, one of the polymorphic forms can give rise to the expression of a defective or otherwise variant protein and, potentially, a genetic disease condition. Examples of genes in which polymorphisms within coding sequences give rise to genetic disease include hemoglobin S (βS; sickle cell anemia), apoE4 (Alzheimer's Disease), Factor V Leiden (thrombosis), and CFTR (cystic fibrosis). cSNPs can alter the codon sequence of the gene and therefore specify an alternative amino acid. Such changes are called “missense” when another amino acid is substituted and “nonsense” when the alternative codon specifies a stop signal in protein translation. When the cSNP does not alter the amino acid specified the cSNP is referred to as “silent”.
Other single nucleotide polymorphisms occur in noncoding regions. Some of these polymorphisms can also result in defective protein expression (e.g., as a result of defective splicing). Still other single nucleotide polymorphisms have no phenotypic effects. Single nucleotide polymorphisms can be employed in the same manner RFLPs and VNTRs can be employed, but offer several advantages.
Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The different forms of characterized single nucleotide polymorphisms are sometimes easier to distinguish than other types of polymorphism (e.g., by the use of assays employing allele-specific hybridization probes or primers).
Only a small percentage of the total repository of polymorphisms in humans and other organisms has been identified. The limited number of polymorphisms identified to date is due, in part, to the large amount of work required to detect the polymorphisms by conventional methods. For example, one conventional approach for identifying polymorphisms is to sequence the same stretch of DNA in a population of individuals by dideoxy sequencing. In this approach, the amount of work required to identify the polymorphism increases in proportion to both the length of sequence and the number of individuals in a population; thus, such techniques become impractical for large stretches of DNA or large numbers of persons.