Population variation can occur at the level of the organism, cell or a cellular component. Variability relating to a nucleic acid or protein is referred to as polymorphism. Traditionally, polymorphism focused on cell membrane proteins, serum-borne proteins and intracellular proteins. Applications included blood group testing, for example ABO or HLA, and isozyme analysis (Lewontin & Hubby, 1966! Genetics 54:595). These methods relied on amino acid changes that altered the molecular charge of the protein or an antigenic determinant. A limitation of the utility of protein polymorphism is that available methods may fail to detect an amino acid substitution if it occurs at an unexposed site.
Nucleic acid polymorphism can occur in both coding and noncoding segments. The characterization of polymorphism in coding sequences has been enhanced by the availability of HLA cDNA clones. Polymerase Chain Reaction (PCR) amplification coupled with allele specific probes has been used to analyze genetic polymorphism at several HLA loci. Potential problems with this methodology include PCR contamination and ambiguous typing results when the sequence detected by a given probe could be assigned to either of two alleles.
Traditional methods of nucleic acid hybridization have also been successfully applied to the study of genetic identification. One of the most powerful techniques is the procedure described by E. M. Southern (Southern, E. M. (1975) J. Mol. Biol. 98:503-517). This method involves the fractionation of the complex genetic material to be analyzed prior to hybridization. Such a genetic analysis can reveal not only the presence or absence of complementary target nucleic acid sequences, but also the size of the restriction fragment(s) containing the target sequence. Genetic variations within a species may be reflected by variations among individuals in the size of the restriction fragments containing a particular target sequence. Conversely, genetic relatedness of a group of individuals may be reflected by a deviation from random variations that exist among unrelated individuals. This aspect of genetic analysis has been called Restriction Fragment Length Polymorphism (RFLP).
A typical RFLP analysis involves digesting target genomic DNA with a restriction endonuclease, separating the digested DNA by gel electrophoresis, transferring the fractionated DNA to a denatured state to a binding surface, hybridizing the transferred DNA with a suitable probe, detecting the signals generated by the probe molecules which have become hybridized to the target DNA. The pattern of the signals generated would provide information about the target DNA. The pattern of signals can also be stored for later use, for instance, to determine or confirm an individual's identification (i.e., the pattern would be the individual's genetic fingerprint).
More commonly, two or more target DNAs are processed for RFLP analysis. Depending on the sources of the target DNA, the information generated by comparison of the patterns can be used immediately as in the case of genetic identity (e.g., identification of a suspect of a crime), or in the case where a high degree of genetic relatedness is present (e.g., paternity testing, sib analysis and the like). In other cases, the information derived from pattern comparison may form a part of a larger information-gathering effort. Pedigree analysis of distant relatives and correlation of a gene or genotype with a trait or medical condition are two examples.
The genetic information which can be adduced using "single-copy" DNA probes depends on the number of probes used, the number of genetic loci each probe is capable of detecting, as well as the heterozygosities and allele frequency of the relevant genetic loci. To date, "single-copy" DNA sequences are known to detect only a single locus per sequence. Moreover, heterozygosity of DNA in higher organisms is low. In man, it is about 0.001 per base pair. Finally, most polymorphic states detected are only dimorphic (i.e. there are only two representational states: absence or presence of a relevant restriction site on the restriction fragment in question). As is often the case, critical individuals in a genetic analysis are homozygous, and the genetic analysis may be uninformative.
Genetic analysis in higher organisms has been simplified considerably by the availability of probes for hypervariable regions of genomic DNA. These hypervariable regions show multi-allelic variation and high heterozygosities. These regions also appear to be widely interspersed within the genome. In each case, the hypervariable region comprises a variable number of tandem repeats of a short sequence (thus, Variable Tandem Repeats or VTR), and polymorphism results from allelic differences in the number of repeats at a given locus. This type of polymorphism, a subclass of RFLP has been called VTR Polymorphism. It is believed that the variation in repeat number arises by mitotic or meiotic unequal exchanges or by DNA "slippage" during replication. Therefore, if genomic DNA is digested with a restriction endonuclease which does not cut within the repeat unit, and if a genetic locus encompasses a variable tandem repeat or VTR, allelic markers would exist for that locus. It should be noted that the so-called repeat unit is a hypothetical consensus sequence, and any VTR sequence in the genome is actually a string of short "core" sequences, each of which is very highly homologous, but usually not identical to the consensus sequence. In deed, a "core" sequence may differ in length from the consensus sequence. The consensus sequence is derived from examining and "averaging" a large number of "core" sequences and is typically at least 70%, but often more than 70% homologous to the consensus sequence.
The utility of clones that detect highly polymorphic loci is evidenced by the White et al. patent (U.S. Pat. No. 4,963,663) and the references cited therein.
However the RFLP analysis is to be used, the pattern of signals is controlled in large part by the probe or probes used in the analysis. A polynucleotide probe may be useful for any of a number of features.
First, a probe may be able to detect polymorphism at a locus that other probes cannot detect. The locus may be particularly useful for genetic analysis in the general population because it has many evenly distributed alleles. Alternatively, the locus may be particularly useful for genetic analysis in a highly restricted segment of the population because it has a rare allele.
Second, a probe may be able to detect many loci simultaneously and unambiguously when a particular restriction endonuclease is used to digest the target DNA. In this connection, it is useful to note that certain restriction endonucleases may be preferred because of the history of the target DNA samples, e.g. forensic samples which have been exposed to the elements for an extended period of time.
Third, probes are often used in combination simultaneously because their resolving power may be compounded. Compounding is obtained when the signals produced by the several probes do not overlap and permit unambiguous assignment of each (or substantially each) signal to an allele of a locus. See, e.g., Baird et al. (1987) "The Application Of DNA-Print For The Estimation Of Paternity," in Advances in Forensic Haemogenetics 2:354-358, Springer-Verlag, N.Y.
A polymorphic DNA locus, recognized by a probe, can be defined by the unique identification number that is assigned by the Human Gene Mapping Library or Genome Data Base. The probe described in this invention recognizes the DNA locus of chromosome 4 which has been assigned the number D4S163. In other words, the polynucleotide sequence utilized in the probe reacts or has a DNA sequence complementary to the genomic DNA at the D4S163 locus of human chromosome 4.
How RFLP phenotypes can be practically applied for paternity and forensic determinations have been discussed in Baird et al., supra; Baird et al. (1987) (II) "The Application Of DNA-PRINT For Identification From Forensic Biological Materials," in Advances in Forensic Haemogenetics 2:396-402, Springer-Verlag, N.Y.; and Baird et al. (1986) Am. J. Hum. Genet. 39:489-50l.
An alternative approach for the genetic analysis, of this type of Variable Tandem Repeat containing regions, may utilize a method that replicates the DNA in the laboratory, in a cell-free system. This replication process can be repeated multiple times so as to produce the specific DNA in amounts sufficient for analysis. By using as primers for replication, polynucleotide sequences complementary to the opposite strands of the DNA sequence flanking the VTR, the size of the product will be determined by the total length of the VTR region. Cetus Corporation has been assigned patents that cover certain forms of this procedure (U.S. Pat. Nos. 4,683,202 and 4,683,195). Examples on the use of this process for replicating nucleic acids from regions of DNA containing VTR can be found in Boerwinkle et al. (1989) Proc. Nat. Acad. Sci. USA 86:212; Horn et al. (1989) Nucleic Acid Res. 17:2140; Wu et al. (1990) Nucleic Acid Res. 18:3102; and Kasai et al. (1990) J. Foren. Sci. 35:1196.