DNA Structure
The genetic framework (i.e., the genome) of an organism is encoded in the double-stranded sequence of nucleotide bases in the deoxyribonucleic acid (DNA) which is contained in the somatic and germ cells of the organism. The genetic content of a particular segment of DNA, or gene, is only manifested upon production of the protein which the gene ultimately encodes. There are additional sequences in the genome that do not encode a protein (i.e., "noncoding" regions) which may serve a structural, regulatory, or unknown function. Thus, the genome of an organism or cell is the complete collection of protein-encoding genes together with intervening noncoding DNA sequences. Importantly, each somatic cell of a multicellular organism contains the full complement of genomic DNA of the organism, except in cases of focal infections or cancers, where one or more xenogeneic DNA sequences may be inserted into the genomic DNA of specific cells and not into other, non-infected, cells in the organism.
Minisatellite and Microsatellite DNA
Interspersed throughout the genomic DNA of most eukaryotic organisms are short stretches of polymorphic repetitive nucleotide sequences known as "minisatellite DNA" sequences or fragments (Jeffreys, A. J., et al, Nature 314:67-73 (1985)). These repeating sequences often appear in tandem and in variable numbers within the genome, and they are thus sometimes referred to as "short tandem repeats" ("STRs") or "variable numbers of tandem repeats" ("VNTRs") (see U.S. Pat. No. 5,075,217; Nakamura et al., Science 235:1616-1622 (1987)). Typically, however, minisatellite repeat units are about 9 to 60 bases in length (Nakamura et al., Science 235:1616-1622 (1987); Weber and May, Am. J. Hum. Genet. 44:388-396 (1989)) which are repeated in tandem about 20-50 times (Watson, J. D., et al., eds., Recombinant DNA, 2nd ed., New York: Scientific American Books, p. 146 (1992)). Other short, simple sequences which are analogous to minisatellite DNAs, termed "microsatellite DNAs" (Litt, M., and Luty, J. A., Am. J. Hum. Genet 44:397-401 (1989); Weber and May, Am. J. Hum. Genet. 44:388-396 (1989)), are usually about 1-6 bases in repeat unit length and thus give rise to monomeric (Economou, E. T., et al, Proc. Natl. Acad Sci. USA 87:2951-2954 (1990)), dimeric, trimeric, quatrameric, pentameric or hexameric repeat units (Litt, M., and Luty, J. A., Am. J. Hum. Genet 44:397-401 (1989); Weber and May, Am. J. Hum. Genet. 44:388-396 (1989)). The most prevalent of these highly polymorphic microsatellite sequences in the human genome is the dinucleotide repeat (dC-dA).sub.n.cndot.(dG-dT).sub.n (where n is the number of repetitions in a given stretch of nucleotides), which is present in a copy number of about 50,000-100,000 (Tautz, D., and Renz, M., Nucl. Acids Res. 12:4127-4138(1984); Dib, C., et al., Nature 360:152-154 (1996)), although the existence of a variety of analogous repeat sequences in the genomes of evolutionarily diverse eukaryotes has been reported (Hamada, H., et al., Proc. Natl. Acad Sci. USA 79:6465-6469 (1982)).
The actual in vivo function of minisatellite and microsatellite sequences is unknown. However, because these tandemly repeated sequences are dispersed throughout the genome of most eukaryotes, exhibit size polymorphism, and are often heterozygous (Weber, J. L., Genomics 7:524-530 (1990)), they have been explored as potential genetic markers in assays attempting to distinguish closely related individuals, and in forensic and paternity testing (see, e.g., U.S. Pat. No. 5,075,217; Jeffreys, A. J., et al., Nature 332:278-281 (1988)). The finding that mutations often are observed in microsatellite DNA regions in cancer cells (Loeb, L. A., Cancer Res. 54:5059-5063 (1994)), potentially linking genomic instability to the carcinogenic process and providing useful genetic markers of cancer, lends additional significance to methods facilitating the rapid analysis and genotyping of polymorphisms in these genomic DNA regions.
Methods of Genotyping Minisatellite or STR DNA Sequences
To analyze minisatellite, microsatellite or STR DNA sequence polymorphisms, a variety of molecular biological techniques have been employed. These techniques include restriction fragment length polymorphism (RFLP) or "DNA fingerprinting" analysis (Wong, Z., et al., Nucl. Acids Res. 14:4605-4616 (1986); Wong, Z., et al., Ann. Hum. Genet 51:269-288 (1987); Jeffreys, A. J., et al., Nature 332:278-281 (1988); U.S. Pat. Nos. 5,175,082; 5,413,908; 5,459,039; and 5,556,955). Far more commonly employed for STR genotyping than RFLP and hybridization, however, are amplification-based methods, such as those relying on the polymerase chain reaction (PCR) method invented by Mullis and colleagues (see U.S. Pat. Nos. 4,683,195; 4,683,202; and 4,800,159). These methods use "primer" sequences which are complementary to opposing regions flanking the polymorphic DNA sequence to be amplified from the sample of genomic DNA to be analyzed. These primers are added to the DNA target sample, along with excess deoxynucleotides and a DNA polymerase (e.g., Taq polymerase; see below), and the primers bind to their target via base-specific binding interactions (i.e., adenine binds to thymine, cytosine to guanine). By repeatedly passing the reaction mixture through cycles of increasing and decreasing temperatures (to allow dissociation of the two DNA strands on the target sequence, synthesis of complementary copies of each strand by the polymerase, and re-annealing of the new complementary strands), the copy number of the minisatellite or STR sequence of DNA may be rapidly increased, and detected by size separation methods such as gel electrophoresis.
PCR and related amplification approaches have been used in attempts to develop methods for typing and analyzing STRs or minisatellite regions. For example, PCR has been employed to analyze polymorphisms in microsatellite sequences from different individuals, including (dC-dA)n.cndot.(dG-dT)n (Weber, J. L., and May, P. E., Am. J. Hum. Genet. 44:388-396 (1989); Weber, J. L., Genomics 7:524-530 (1990); U.S. Pat. Nos. 5,075,217; 5,369,004; and 5,468,613). Similar methods have been applied to a variety of medical and forensic samples to perform DNA typing and to detect polymorphisms between individual samples (U.S. Pat. Nos. 5,306,616; 5,364,759; 5,378,602; and 5,468,610).
In Vitro Use of DNA Polymerases
The above-described amplification-based techniques require the use of DNA polymerases, which catalyze the addition of deoxynucleoside triphosphate (dNTP) bases into the newly forming DNA strands. Together with other enzymes (e.g., helicases, ligases and ATPases), the DNA polymerases ensure rapid and relatively faithful replication of DNA in preparation for proliferation in vivo in prokaryotes, eukaryotes and viruses.
DNA polymerases synthesize the formation of DNA molecules which are complementary to a DNA template. Upon hybridization of a primer to the single-stranded DNA template, polymerases synthesize DNA in the 5' to 3' direction, successively adding nucleotides to the 3'-hydroxyl group of the growing strand. Thus, in the presence of deoxyribonucleoside triphosphates (dNTPs) and a primer, a new DNA molecule, complementary to the single stranded DNA template, can be synthesized.
In addition to an activity which adds dNTPs to DNA in the 5' to 3' direction (i.e., "polymerase" activity), many DNA polymerases also possess activities which remove dNTPs in the 5' to 3' and/or the 3' to 5' direction (i.e., "exonuclease" activity). This dual activity of certain DNA polymerases is, however, a drawback for some in vitro applications. For example, the in vitro synthesis of an intact copy of a DNA fragment by the polymerase activity, an elongation process which proceeds in a 5' to 3' direction along the template DNA strand, is jeopardized by the exonuclease activities which may simultaneously or subsequently degrade the newly formed DNA.
Limitations of PCR-based Genotyping of Minisateilite, Microsatellite and STR DNA Sequences
Application of PCR-based methods to analysis of minisatellite or STR DNA sequences has a number of significant limitations. It has been shown, for example, that use of Taq and other thermostable DNA polymerases commonly employed in PCR and related automated amplification methods causes the accumulation of amplification products containing non-templated 3' terminal nucleotides (Clark, J. M., et al., J. Molec. Biol. 198:123-127 (1987); Clark, J. M., Nucl. Acids Res. 16:9677-9686 (1988); Hu, G., DNA Cell Biol. 12:763-770 (1993)). That is, some of the newly synthesized DNA strands produced in each round of amplification have had an extra nucleotide added to their 3' termini, such that the newly synthesized strands may be longer by one base.
Non-templated nucleotide addition is a slow process compared to template-directed synthesis (Clark, J. M., Nucl. Acids Res. 16:9677-9686 (1988)), and its extent is sequence-dependent (Hu, G., DNA Cell Biol. 12:763-770 (1993); Brownstein, M. J., et al., BioTechniques 20:1004-1010 (1996)). Consequently, the PCR product is often heterogeneous in regard to extra nucleotide addition depending upon the primers and the reaction conditions used by the investigator (Magnuson, V. L., et al., BioTechniques 21:700-709 (1996)). Extra nucleotide addition, in combination with "stutter" due to slippage during PCR amplification (Levinson, G., and Gutman, G. A., Molec. Biol. Evol. 4:203-221 (1987); Schlotterer, C., and Tautz, D., Nucl. Acids Res. 20:211-215 (1992)), often results in complex DNA fragment patterns which are difficult to interpret, especially by automated methods. This can result in improper genotyping analysis, particularly if the percentage of non-templated nucleotide addition is between 30-70% of the PCR product (Smith, J. R., et al., Genome Res. 5:312-317 (1995)).
Thus, a need currently exists for a rapid, automated method for identifying, analyzing and typing polymorphic DNA fragments, particularly minisatellite, microsatellite or STR DNA fragments, that will not result in the problematic results described above. The present invention provides such a method.