Double stranded DNA is the most common form of depository of genetic information of organisms. Double stranded DNA has two complementary strands. Each strand is a polynucleotide sequence and the base sequences on the two complementary strands form Watson-Crick base pairs. The duplex structure of DNA can be disrupted in a number of ways, for example, by heating a duplex DNA solution in a 0.1M NaCl to 100.degree. C. for a few minutes. At this temperature, the two strands of duplex DNA separate. If the solution is gradually cooled, the two strands of duplex DNA can re-associate to reform the duplex structure.
The process of duplex formation from complementary polynucleotide or oligonucleotide sequences has been used advantageously for genetic analysis. Typically, a labeled polynucleotide or oligonucleotide sequence is used in a reassociation process whereby it forms a duplex structure with a substantially complementary sequence from a genetic source of interest. Because the labeled polynucleotide or oligonucleotide sequence is normally, though not necessarily, obtained from a source other than the source of interest, the process of association between complementary sequences has been known as nucleic acid hybridization, or just hybridization for short. The associational event provides genetic information about the source of interest through detection of the label on the labeled polynucleotide or oligonucleotide sequence. For this reason, the labeled polynucleotide or oligonucleotide sequence is called a probe. The label can be any suitable signal-generating moiety, and many such moieties are well known in the art.
Nucleic acid hybridization has been successfully applied in the study of DNA structure, gene purification, gene localization, the establishment of paternity and other familial relationship, genetic identity for forensic purposes, genetic identity of transplants, and detection and diagnosis of diseases and genetic traits.
One very powerful technique in the application of nucleic acid hybridization involves the fractionation of the complex genetic material to be analyzed prior to hybridization. E. M. Southern's procedure is the most celebrated and the most widely used of this genus. See Southern, J. Mol. Biol. 98: 503-517 (1975). Such a genetic analysis can reveal not only the presence or absence of complementary target nucleic acid sequences, but also the size of the restriction fragment(s) containing the target sequence. Genetic variations within a species may be reflected by variations among individuals in the size of the restriction fragments containing a particular target sequence. Conversely, genetic relatedness of a group of individuals may be reflected by a deviation from random variations that exist among unrelated individuals. This aspect of genetic analysis has been called Restriction Fragment Length Polymorphism (RFLP).
"Single-copy" DNA probes have been used in this approach with success. For example, certain genetic traits and disease states have been identified this way. See Gusella et al., Nature, 306: 234-238 (1983); Orkin, Cell 47:845-850 (1986).
The genetic information which can be adduced using "single-copy" DNA probes depends on the number of probes used, the number of genetic loci each probe is capable of detecting, the heterozygosities and the allele frequency of the relevant genetic loci. To date, "single-copy" DNA sequences are known to detect only a single locus per sequence. Moreover, heterozygosity of DNA in higher organism is low. In man, it is about 0.001 per base pair. Finally, most polymorphic states detected are only dimorphic (i.e. there are only two representational states: absence or presence of a relevant restriction site on the restriction fragment in question). As is often the case, critical individuals in a genetic analysis are homozygous, and the genetic analysis may be uninformative.
Genetic analysis in higher organisms has been simplified considerably by the availability of probes for hypervariable regions of genomic DNA. These hypervariable regions show multi-allelic variation and high heterozygosities. These regions also appear to be widely interspersed within the genome. In each case, the hypervariable region comprises a variable number of tandem repeats of a short sequence (thus, Variable Tandem Repeats or VTR), and polymorphism results from allelic differences in the number of repeats at a given locus. This type of polymorphism, a subclass of RFLP has been called VTR Polymorphism. It is believed that the variation in repeat number arises by mitotic or meiotic unequal exchanges or by DNA "slippage" during replication. Therefore, if genomic DNA is digested with a restriction endonuclease which does not cut within the repeat unit, and if a genetic locus encompasses a variable tandem repeat or VTR, allelic markers would exist for that locus. (It should be noted that the so-called repeat unit is a hypothetical consensus sequence, and any actual VTR sequence in the genome is really a string of short "core" sequences, each of which is very highly homologous, but usually not identical to the consensus sequence. Indeed, a "core" sequence may differ in length from the consensus sequence. The consensus sequence is derived from examining and "averaging" over a large number of "core" sequences. A "core" sequence is typically at least 70%, but often more than 70%, homologous to the consensus sequence.)
Jarman et al. have described a hypervariable region of DNA located 8-kb downstream of the human alpha globin complex. The EMBO J. 5: 1857-63 (1986). This hypervariable region is composed of an array of imperfect 17-bp tandem repeats, the number of which differs considerably (70-450) from one allele to another. Thus, this locus is highly polymorphic. Genetic polymorphism which reflects variations in the number of such tandem among individuals has been called Variable Tandem Repeat Length Polymorphism.
The VTR described by Jarman et al., supra, cross-hybridizes with other hypervariable genetic loci at low stringency. Thus a polynucleotide probe prepared from this region is potentially a very powerful probe, capable of probing many genetic loci in a single try.
A typical RFLP analysis involves digesting target genomic DNA with a restriction endonuclease, separating the digested DNA by gel electrophoresis, transferring the fractionated DNA in a denatured state to a binding surface, hybridizing the transferred DNA with a suitable probe, detecting the signals generated by the probe molecules which have become hybridized to the target DNA. The pattern of the signals generated would provide information about the target DNA. The pattern of signals can also be stored for later use, for instance, to determine or confirm an individual's identification (i.e., the pattern would be the individual's genetic fingerprint).
More commonly, two or more target DNA's are processed for RFLP analysis. Depending on the sources of the target DNA, the information generated by comparison of the patterns can be used immediately as in the case of genetic identity (e.g., identification of a suspect of a crime), or in the case where a high degree of genetic relatedness is present (e.g., paternity testing, sib analysis and the like). In other cases, the information derived from pattern comparison may form a part of a larger information-gathering effort. Pedigree analysis of distant relatives and correlation of a gene of genotype with a trait or medical condition are but two examples.
However the RFLP analysis is to be used, the pattern of signals is controlled in large part by the probe or probes used in the analysis. A polynucleotide probe may be useful for any of a number of features.
First, a probe may be able to detect polymorphism at a locus that other probes cannot detect. The locus may be particularly useful for genetic analysis in the general population because it has many evenly distributed alleles. Alternatively, the locus may be particularly useful for genetic analysis in a highly restricted segment of the population because it has a rare allele.
Second, a probe may be able to detect many loci simultaneously and unambiguously when a particular restriction endonuclease is used to digest the target DNA. In this connection, it is useful to note that certain restriction endonucleases may be preferred because of the history of the target DNA samples, e.g. forensic samples which have been exposed to the elements for an extended period of time.
Third, probes are often used in combination simultaneously because their resolving power may be compounded. Compounding is obtained when the signals produced by the several probes do not overlap and permit unambiguous assignment of each (or substantially each) signal to an allele of a locus. See, e.g., "The Application Of DNA-Print For The Estimation Of Paternity", Baird et al. in Advances in Forensic Haemogenetics 2: 354-358, Springer-Verlag, New York (1987).
How RFLP phenotypes can be practically applied for paternity and forensic determinations has been discussed in Baird et al., supra; Baird et al. (II), "The Application Of DNA-PRINT.TM. For Identification From Forensic Biological Materials", in Adv. in Forensic Haemogenetics 2: 396-402, Springer-Verlag, New York (1987); and Baird et al. (III), Am. J. Hum. Genet. 39:489-501 (1986) and citations therein. These papers are hereby incorporated by reference.
Reference has been made earlier in the instant disclosure that hybrid formation can take place even where there is a certain degree of mismatch between a probe and its substantially complementary target sequence. This process is particularly important for the utility of multilocus probes. Such probes generally form well-matched hybrids with target sequences which originate from the same genetic locus as the probe, but they form less well-matched "cross-hybrids" with target sequences from other loci. As a result, the loci that can be analyzed with a given probe may vary significantly with the reaction (association and washing) conditions of the hybridization test. Thus, many loci are detectable under low stringency conditions, but only a single locus is detectable under high stringency conditions. Therefore, a polynucleotide sequence which is capable of probing multiple polymorphic loci even under high stringency conditions represents an additional bonus.
Strictly speaking, stringency of conditions has two components: conditions which govern formation of the hybrids and conditions which govern the stability of (the duplex structure of) the hybrids. Typically, however, a hybridization test is performed at low (or relaxed) conditions during the hybrid formation phase to speed up the process of association. Therefore, for the purpose of this application, stringency of conditions refer solely to conditions which govern the stability of the hybrids. (Of course, if a hybrid is not stable under a given set of conditions, it would not be formed in the first place under those conditions.)
The factors which govern the stability of a hybrid are many, including, but not limited to the temperature, the ionic strength, the molecular species of the salts used, the degree of modification or elimination of bases on a polynucleotide sequence, the degree and nature of mismatch, and the length and type of polynucleotides sequence. Variation in one factor may be compensated or aggravated by variations in other factors. These and other relevant facts are well known to a person of ordinary skill in the art of molecular genetics.
For the purpose of this invention, low stringency conditions mean an aqueous environment containing about 2 X SSC at about 50.degree.-65.degree. C., or the equivalents thereof; and high stringency conditions mean an aqueous environment containing about 0.1 X SSC or less at about 65.degree. C., or the equivalents thereof. [For formulation of 1 X SSC, see Example 3 in Section 6 below].
For the purpose of this invention, a "discrete polynucleotide sequence or subsequence" means a polynucleotide sequence or subsequence of greater than 15 nucleotides, but preferably greater than 50 nucleotides, and very preferably greater than 100 nucleotides; and a polynucleotide means a chain of about 15 nucleotides or more, and embraces the upper range of what sometimes passes as oligonucleotides.
Many "single copy" DNA probes are known in the art. These probes do not relate closely to the present invention because their utility is generally (1) limited to providing genetic information at a single locus; and (2) limited to detecting polymorphism caused by alteration of a restriction site in the neighborhood of the target genomic sequences. The polymorphic probes of the present invention are of the VTR type and do not suffer from these limitations.
Polymorphic probes of the VTR type have also been described. However, the hybrids formed between a VTR probe and its target genomic sequences tend to be stable only under low to moderate stringency conditions, except for hybrids between the probe and target sequences from a single genetic locus. See Nakamura et al. (I), Science 235: 1616-1622 (1987); Jeffreys et al., Nature 314:67-73 (1985). The exceptional hybrids are stable even under high stringency conditions, possibly reflecting the fact that the probes originated from this locus.
Sometimes a genetic locus detectable with a VTR type probe may be very large, spanning several hundred kilobases. In a restriction fragment length polymorphism analysis of such a large locus, a VTR can yield many polymorphic bands under high stringency conditions. However, the information which can be derived from such an analysis remains confined to the one locus. In fact, VTR probes for loci of this kind have disadvantages. First, recombination within the large locus (which is expected to be more frequent than a similar but smaller locus) can complicate data analysis. Second, to obtain more extensive information than is obtainable from only a single locus, two or more probes are preferably used in combination. The multiplicity of non-informational bands from the large locus may obscure bands detected by other probes used in combination, thereby making data analysis very difficult. The alternative to using probes in combination would be more costly multiple analyses of restriction fragment length polymorphism. Therefore, the difference between a probe for a large single locus and a probe for multiple loci is substantive, and not merely semantic.