The vast majority of DNA in higher organisms is identical in sequence among different individuals (or more accurately among the chromosomes of those individuals). A small fraction of DNA, however, is variable or polymorphic in sequence among individuals, with the formal definition of polymorphism being that the most frequent variant (or allele) has a population frequency which does not exceed 99% (Gusella, J. F. (1986), Ann. Rev. Biochem. 55:831-854). In the past, polymorphisms were usually detected as variations in gene products or phenotypes such as human blood types. Currently, almost all polymorphisms are detected directly as variations in genomic DNA.
Analysis of DNA polymorphisms has relied on variations in the lengths of DNA fragments produced by restriction enzyme digestion. Most of these restriction fragment length polymorphisms (RFLPs) involve sequence variations in one of the recognition sites for the specific restriction enzyme used. This type of RFLP contains only two alleles, and hence is relatively uninformative.
A second type of RFLP is more informative and involves variable numbers of tandemly repeated DNA sequences between the restriction enzyme sites. These polymorphisms called minisatellites or VNTRs (for variable numbers of tandem repeats) were developed first by Jeffreys (Jeffreys et al. (1985), Nature 314:67-73). Jeffreys has filed two European patent applications, 186,271 and 238,329, dealing with the minisatellites. The first Jeffreys' application ('271) identified the existence of DNA regions containing hypervariable tandem repeats of DNA. Although the tandem repeat sequences generally varied between minisatellite regions, Jeffreys noted that many minisatellites had repeats which contain core regions of highly similar sequences. Jeffreys isolated or cloned, from genomic DNA, polynucleotide probes comprised essentially of this core sequence (i.e., wherein the probe had at least 70% homology with one of his defined cores). These probes were found to hybridize with multiple minisatellite regions (or loci). The probes were found to be useful in forensic or paternity testing by the identification of unique or characteristic minisatellite profiles. The later Jeffreys' European patent application proposed the use of probes which were specific for individual minisatellites located at specific loci in the genome. One problem with the Jeffreys' approach is that some of the most highly variable and hence useful minisatellites are susceptible to significant frequencies of random mutation (Jeffreys et al., 1988, Nature 332:278-281).
Other tandemly repeated DNA families, different in sequence from the Jeffreys minisatellites, are known to exist. In particular, (dC-dA).sub.n.(dG-dT).sub.n sequences have been found in all eukaryotes that have been examined. In humans there are 50,000-100,000 blocks of (dC-dA).sub.n.(dG-dT).sub.n sequences, with n ranging from about 15-30 (Miesfeld et al. (1981), Nucleic Acids Res. 9:5931-5947; Hamada and Kakunaga (1982), Nature 298:396-398; Tautz and Renz (1984), Nucleic Acids Res. 12:4127-4138).
Prior to the work of this invention, a number of different human blocks of (dC-dA).sub.n.(dG-dT).sub.n repeats had been cloned and sequenced, mostly unintentionally along with other sequences of interest. Several of these characterized sequences were analyzed independently from two or more alleles. In arriving at this invention, sequences from these different alleles were compared. Variations in the number of repeats per block of repeats were found in several cases (Weber and May, 1989, Am. J. Hum. Genet. 44:388-396, incorporated herein by reference in its entirety).
Although three isolated research groups produced published notations of site specific differences in sequence length (Das et al., 1987, J. Biol Chem. 262:4787-4793; Slightom et al., 1980, Cell 21:627-638; Shen and Rutter, 1984, Science 244:168-171), none of the groups recognized nor appreciated the extent of this variability or its usefulness and none generalized the observation. The other groups also did not consider the use of (dC-dA).sub.n.(dG-dT).sub.n sequences as genetic markers and did not offer a method by which such polymorphisms might be analyzed.