A need exists for characterized human nuclear loci that could be analyzed for diversity in a reasonably rapid way. Such diversity, or polymorphism, is useful for establishing human identity, often for forensic purposes. In addition, diversity is useful for establishing parentage. The use of polymorphism for these purposes and the methods to do so are described by the Committee on DNA Forensic Science, The Evaluation of Forensic DNA Evidence, National Academy Press, Washington, D.C., (1996) and by Walker, Inclusion Probabilities in Parentage Testing, American Association of Blood Banks, Arlington, Va. (1983). An in-depth analysis of genetic, ethnic and geographical variation is provided by Cavalli-Sforza et al. (The History and Geography of Human Genes. Princeton University Press, Princeton, N.J. (1994)). Nei (Molecular Population Genetics. Columbia University Press, New York (1987) provides a general discussion of the principles of population genetics.
Most human nuclear regions, however, show so little diversity that analysis requires sequencing of very long genomic regions to be informative. Regions of the genome that are hypervariable overcome this difficulty by allowing a significant amount of sequence variation in a shorter DNA sequence, providing a tremendous benefit for studies of human diversity and molecular anthropology.
Additional information can be derived from linkage disequlibrium of polymorphisms. Disequilibrium among polymorphisms can be correlated with ethnic origins and thus used to provide information about ethnic descent of an individual from his DNA.
Others have used length polymorphisms (e.g., VNTRS and microsatellites (Nakamura et al, 1987; Bowcock et al, 1994; Deka et al, 1995) or minisatellites (Amour et al, 1996), or combinations of markers for linkage disequilibirum studies (Tishkoff et al, 1996). These existing technologies are limited by low levels of polymorphism and complex analytical methods.
Analyses that utilize DNA sequence of point mutations directly avoid these problems, but only if the polymorphism density within the selected sequence is high. By analyzing a region with a high density of sequence variation, a large amount of useful information can be obtained from a short sequence.
All of the terms used in the specification and the claims are known to one skilled in the act. Nevertheless, in order to provide a clear and consistent understanding of the specification and the claims, including the scope given to such terms, the following definitions are provided.
Polymorphic locus or gene: A nucleic acid sequence localized in the diploid genome wherein the homologous copies are not identical.
Nucleotide diversity is the average number of nucleotide differences per site between any two randomly chosen sequences. This term and related concepts are further explained in Li and Grauer, Fundamentals of Molecular Evolution, Sinauer Associates, Inc., Pub., (1991).
The term xe2x80x9chaplotypexe2x80x9d means the set of alleles linked on a single chromosome.
The term xe2x80x9cgenotypexe2x80x9d means the set of alleles present in an individual.
Alu sequences comprise a family of generally nonfunctional processed pseudogenes. Alu elements are DNA sequences that are approximately 300 bp long that belong to a family of repeated sequences. Alu family members appear more than 500,000 times in the human genome, constituting 5-6% of the genome (see Li and Grauer, Fundamentals of Molecular Evolution, Sinauer Associates, Inc., Pub., (1991).
The term xe2x80x9camplifyingxe2x80x9d which typically refers to an xe2x80x9cexponentialxe2x80x9d increase in target nucleic acid is being used herein to describe both linear and exponential increases in the numbers of a select target sequence of nucleic acid.
The term xe2x80x9camplificationxe2x80x9d refers to any in vitro means for multiplying the copies of a target sequence of nucleic acid. Such methods include but are not limited to those discussed herein. Sequence-based amplification systems such as the polymerase chain reaction (PCR), nucleic acid sequence-based amplification (NASBA) (see Sooknanan and Malek, 1995, Biotechnology, 13: 563-564), and strand displacement amplification (SDA) (see Walker et al., 1994, Nucleic Acids Res.) amplify a target nucleic acid sequence. Signal-based amplification such as oligonucleotide ligation assay (OLA), Q.xcex2. RNA replicase (Lizardi and Kramer. 1991. TIB 9: 53-58), cycling probe reaction (CPR) (Duck et al. 1991. Biotechniques 9: 142-147) and branched DNA (bDNA) (Urdea. 1993. Clin. Chem. 39: 725-726), amplify or alter a signal from a detection reaction that is target dependent.
The term xe2x80x9camplifyingxe2x80x9d which typically refers to an xe2x80x9cexponentialxe2x80x9d increase in target nucleic acid is being used herein to describe both linear and exponential increases in the numbers of a select target sequence of nucleic acid.
The term xe2x80x9cprimerxe2x80x9d as used herein refers to an oligonucleotide, whether occurring naturally as in a purified restriction digest or produced synthetically, which is capable of acting as a point of initiation of synthesis when placed under conditions in which synthesis of a primer extension product which is complementary to a nucleic acid strand is initiated, i.e., in the presence of four different nucleoside triphosphates and a DNA polymerase in an appropriate buffer (xe2x80x9cbufferxe2x80x9d includes pH, ionic strength, cofactors, etc.) and at a suitable temperature. Several methods of amplification that use primers have been devised, the best known being PCR. For example, in Stand Displacement Amplification (SDA), the 3xe2x80x2 end of the amplification primer (the target binding sequence) hybridizes at the 3xe2x80x2 end of the target sequence and comprises a recognition site for a restriction enzyme near its 5xe2x80x2 end.
The term xe2x80x9cthermocycling profilexe2x80x9d as used herein refers to the selected temperature parameters selected for xe2x80x9cnxe2x80x9d cycles of amplification. The thermocycling profile includes at least two temperatures, a high denaturation temperature, adequate for sample-template, and subsequent product, denaturation, and a low temperature appropriate for primer annealing and polymerase extension. Accordingly, particular thermocycling parameters are selected to control primer annealing and product denaturation and thus regulate accessibility and primer extension.
The choice of primers for use in PCR determines the specificity of the amplification reaction. Primers used in the present invention are generally oligonucleotides, usually deoxyribonucleotides several nucleotides in length, that can be extended in a template-specific manner by the polymerase chain reaction. The primer is sufficiently long to prime the synthesis of extension products in the presence of the agent for polymerization and typically contains 10-30 nucleotides, although that exact number is not critical to the successful amplification.
A primer is selected to be xe2x80x9csubstantiallyxe2x80x9d complementary to a strand of the template having a specific sequence. For primer extension to occur, the primer must be sufficiently complementary to anneal to the nucleic acid template under the reaction conditions. Not every nucleotide of the primer must anneal to the template for primer extension to occur. The primer sequence need not reflect the exact sequence of the template. For example, in one embodiment of the invention, a non-complementary nucleotide fragment or tail is attached to the 5xe2x80x2 end of the primer with the remainder of the primer sequence being complementary to the template.
Alternatively, non-complementary bases can be interspersed into the primer, provided that the primer sequence has sufficient complementarity with the template for hybridization to occur and allow synthesis of complementary DNA strand.
As used herein in referring to primers, probes, or secondary oligonucleotides, the term xe2x80x9coligonucleotidexe2x80x9d refers to a molecule comprised of two or more deoxyribonucleotides or ribonucleotides, and preferably more than three. Its exact size is not critical (except as noted herein), but the size depends upon many factors including the ultimate use or function of the oligonucleotide. The oligonucleotide may be derived synthetically, by cloning or by other methods known in the art.
In some situations where mismatches between the targeted nucleic acid and a primer are suspected, the effect of the mismatch may be overcome using specialized primer compositions, such as those described for example in EP-A-0 393 743 (published Oct. 24, 1990) and EP-A-0 417 842 (published Mar. 20, 1991).
Competitive oligonucleotide priming (COP), distinguishes closely related DNA sequences by comparing competitive annealing of two or more DNA sequences closely matched to the DNA sequence of interest. While the selected primer need not reflect the exact sequence of the template in the competitive oligonucleotide primer assay, it is important that the different sequences used as competitive oligonucleotide primers have different denaturation temperatures. For example, in the detection of a normal genetic sequence, the competitive oligonucleotide primers could include a primer which is an exact copy of the complementary strand to the normal genetic sequence and a primer which is a copy of the complementary strand with one base pair mismatched. Both a perfectly matched primer and a primer with a single DNA base mismatch are able to bind to the template. However, when the two closely related primers are incubated together with the DNA template, the binding of the perfectly matched primer will be favored over a primer with a single base mismatch. Alternatively, one of the primers can contain one base mismatch to the known genetic sequence and the other oligonucleotides would contain at least two mismatches. Thus, the requirements are that one of the sequences have N mismatches and the other sequence or sequences have greater than N mismatches, where N can be from zero to any number of mismatches which will still provide a substantially similar sequence able to bind. When two oligonucleotides differing by a single DNA base are supplied as primers in a reaction containing a single DNA or RNA template then the perfectly matched oligonucleotide primer will be highly favored over the primer with the single base mismatch. Similarly, if neither primer is a perfect match the more closely matched primer will be favored. The greater the difference between the sequence of interest and the other sequences, the more efficiently the competitive oligonucleotide primer assay functions. However, when the difference is too great, it may no longer function as a competitive assay.
Some detection technologies require perfect complimentary between oligonucleotides, whether used as probes or primers. In particular, allele discriminating nucleic acid sequence detection technologies are those which discriminate against sequences that vary by as little as one internally located nucleotide. Any mismatched sequences, even by a single nucleotide, are non-targets for these technologies.
By xe2x80x9chomogeneousxe2x80x9d is meant that the process does not require a separation of the detected target nucleic acid from nontargeted materials.
Fluorescence resonance energy transfer (FRET) is a well known spectroscopic phenomenon that has been exploited in oligonucleotide labeling schemes involving pairs of fluorescent dyes. In all of these schemes pairs of dyes are coupled in such a way that a signal is generated showing indicated whether the dyes are coupled by FRET or a FRET-like mechanism. The use of such pairs and FRET coupling is well known to those skilled in the art. Several homogeneous genotyping methods based on PCR amplification and FRET detection have recently been described. The design and synthesis of FRET dye-labeled oligonucleotides is described by Ju et al (1995, Anal. Biochem. 231: 131-140; and Benson, et al, 1995, 231: 247-255).