DNA typing is commonly used to identify the parentage of human children, and to confirm the lineage of horses, dogs, and other prize animals. DNA typing is also commonly employed to identify the source of blood, saliva, semen, and other tissue found at a crime scene. DNA typing methods in use today are designed to detect and analyze differences in the length and/or sequence of one or more regions of DNA known to appear in at least two different forms in a population. DNA typing is also employed in clinical settings to determine success or failure of bone marrow transplantation and presence of particular cancerous tissues. Such length and/or sequence variation is referred to as "polymorphism." Any region (i.e. "locus") of DNA in which such a variation occurs is referred to as a "polymorphic locus." Most DNA typing techniques employ at least one "marker" containing the at least one such polymorphic locus. Each individual marker contains a single allele of genomic DNA ultimately derived from a single individual in a population. The methods and materials of the present invention are all designed for use in the detection of a particular class of polymorphisms in DNA characterized primarily by variation in length.
Genetic markers which are sufficiently polymorphic with respect to length or sequence have long been sought for use in identity applications, such as paternity testing and identification of tissue samples collected for forensic analysis. The discovery and development of such markers and methods for analyzing such markers have gone through several phases of development over the last several years. In recent years, the discovery and development of polymorphic short tandem repeats (STRs) as genetic markers has stimulated progress in the development of linkage maps, the identification and characterization of diseased genes, and the simplification and precision of DNA typing. The term "short tandem repeat" or "STR" as used herein refers to all sequences between two and seven nucleotides long which are repeated perfectly, or nearly perfectly in tandem within the genomic DNA of any organism. See, for example, the definition of "short tandem repeat" applied to human genomic DNA in U.S. Pat. No. 5,364,759, column 4, line 58 et seq.
The first identified DNA variant markers were simple base substitutions, i.e. simple sequence polymorphisms, which were most often detected by Southern hybridization assays. For examples of references describing the identification of such markers, designed to be used to analyze restriction endonuclease-digested DNA with radioactive probes, see: Southern, E. M. (1975), J. Mol. Biol. 98(3):503-507; Schumm, et al. (1988), American Journal of Human Genetics 42:143-159; and Wyman, A. and White, R. (1980) Proc. Natl. Acad. Sci, U.S.A. 77:6754-6758.
The next generation of markers were size variants, i.e. length polymorphisms, specifically "variable number of tandem repeat" (VNTR) markers (Nakamura Y., et al. (1987), Science 235: 1616-1622; and U.S. Pat. No. 4,963,663 issued to White et al. (1990); U.S. Pat. No. 5,411,859 continuation of U.S. Pat. No. 4,963,663 issued to White et al. (1995)) and "minisatellite" markers (Jeffreys et al. (1985a), Nature 314:67-73; Jeffreys et al. (1985b) Nature 316:76-79., U.S. Pat. No. 5,175,082 for an invention by Jeffreys). Both VNTR and minisatellite markers, contain regions of nearly identical sequences repeated in tandem fashion. The core repeat sequence is 10 to 70 bases in length, with shorter core repeat sequences referred to as "minisatellite" repeats and longer repeats referred to as VNTRs. Different individuals in a human population contain different numbers of these repeats. These markers are more highly polymorphic than base substitution polymorphisms, sometimes displaying up to forty or more alleles at a single genetic locus. However, the tedious process of restriction enzyme digestion and subsequent Southern hybridization analysis are still required to detect and analyze most such markers.
The next advance involved the joining of the polymerase chain reaction (PCR) (U.S. Pat. No. 4,683,202 by Mullis, K. B.) technology with the analysis of VNTR loci (Kasai K, et al. (1990) Journal Forensic Science 35(5):1196-1200). Amplifiable VNTR loci were discovered, which could be detected without the need for Southern transfer. The amplified products are separated through agarose or polyacrylamide gels and detected by incorporation of radioactivity during the amplification or by post-staining with silver or ethidium bromide. However, PCR can only be used to amplify relatively small DNA segments reliably, i.e. only reliably amplifying DNA segments under 3,000 bases in length Ponce, M & Micol, L. (1992) NAR 20(3):623; Decorte R, et al. (1990) DNA Cell Biol. 9(6):461-469). Consequently, very few amplifiable VNTRs have been developed, making them, as a class, impractical for linkage mapping.
With the recent development of polymorphic markers with polymorphic dinucleotide repeats (Litt and Luty (1989) Am J. Hum Genet 3(4):599-605; Tautz, D (1989) NAR 17:6463-6471; Weber and May (1989) Am J Hum Genet 44:388-396; German Pat. No. DE 38 34 636 C2, inventor Tautz, D; U.S. Pat. No. 5,582,979 filed by Weber, L.) and with polymorphic short tandem repeats (STR) (Edwards, A., et al. (1991) Am. J. Hum. Genet. 49: 746-756.; Hammond, H. A., et al. (1994) Am. J. Hum. Genet. 55:175-189; Fregeau, C. J.; and Fourney, R. M. (1993) BioTechniques 15(1): 100-119.; Schumm, J. W. et al. (1994) in The Fourth International Symposium on Human Identification 1993, pp.177-187; and U.S. Pat. No. 5,364,759 by Caskey et al.; German Pat. No. DE 38 34 636 C2 by Tautz, D.) many of the deficiencies of previous methods have been overcome. The two types of markers, those containing dinucleotide or STR repeats (which by definition include 2-7 bp repeats), are generally referred to as "microsatellite" markers. Often considered to be the best available markers, the microsatellite loci are similar to amplifiable VNTRs, in that their alleles may be differentiated based on length variation. However, unlike VNTRs, these loci contain perfect or imperfect repeat sequences two, three, four, or rarely, five bases long. They display from just a few alleles to more than forty at a single locus. Amplification protocols can be designed to produce small products, generally from 60 to 400 base pairs long, and alleles from each locus are often contained within a range of less than 50 bp. This allows simultaneous electrophoretic analysis of several systems on the same gel by careful design of PCR primers such that all potential amplification products from an individual system do not overlap the range of alleles of other systems in the same gel.
Three significant drawbacks relate to the use of microsatellite loci. First, the presence of stutter artifacts, that is, one or more minor fragments in additional to the major fragment representing each allele, is often seen following amplification. This deficiency is much more severely displayed with dinucleotide repeat loci than with tri- or tetranucleotide repeat markers (Edwards et al., 1991. Am J Hum Genet 49;746-756; Edwards et al., 1992. Genomics 12:241-253; Weber & May, 1989. Am J Hum Genet 44:388-396). The presence of these artifacts, presumed to result from a DNA polymerase-related phenomenon called repeat slippage (Levinson & Gutman, 1987. Mol. Biol. Evol. 4(3):203-221; Schlotterer & Tautz, 1992. NAR 20:211-215), complicates the interpretation of allelic content of the loci. While complicating all interpretations, the presence of major and minor fragments to represent each allele especially limits the usefulness of these markers in forensic analysis which often require determination of whether more than one source of DNA sample is present. Many of the markers described in this work represent a new class of markers which produce significantly less stutter artifact than known markers.
A second drawback to current STR and microsatellite marker systems relates to the difficulty in separating multiple loci in a single gel. This occurs because there is spacial compression of fragments of different size in the upper regions of the gels most commonly used for separation of DNA fragments by those skilled in the art. Development of the markers described in this work, based on larger repeat units, extends the useful range within these gels, allowing simultaneous analysis of more loci.
A third drawback is that, prior to the invention disclosed herein, only a few DNA loci of human genomic DNA had been described in the literature, with length polymorphisms based on variations in a number of five to seven base repeats at each such locus. See, e.g. Edwards et al. (1991) Nucleic Acids Res. 19:4791; Chen et al. (1993) Genomics 15(3): 621-5; Harada et al. (1994) Am. J. Hum. Genet. 55: 175-189; Comings et al. (1995), Genomics 29(2):390-6; and Utah Marker Development Group (1995), Am. J. Genet. 57:619-628. In 1995, Jurka and Pethiyagoda published an article describing a study in which they had used the GenBank database to determine the relative abundance and variability of pentameric and hexameric tandem repeats in the primate genome (Jurka and Pethiyagoda (1995) J. Mol. Evol. 40:120-126). However, variability was only indirectly estimated, and polymorphism levels at individual loci were not demonstrated. Id. We have developed materials and methods for identifying and analyzing DNA loci which contain highly polymorphic repeats of five to seven base repeats.
The materials and methods of the present method are designed for use in identifying and analyzing particular polymorphic loci of DNA of various types, including single-stranded and double-stranded DNA from a variety of different sources. The present invention represents a significant improvement over existing technology, bringing increased power and precision to DNA profiling for linkage analysis, criminal justice, paternity testing, and other forensic and medical uses.