The analysis of polymorphisms in the number of repeated DNA sequence elements (“repeats”) in certain designated genetic loci has a variety of applications in molecular diagnostics and biomedical research. These applications include the molecular determination of identity for parentage and forensic analysis, the diagnosis of genetic diseases including Huntington's disease and fragile X syndrome caused by the expansion of trinucleotide repeats, the analysis of genetic markers relating to gene regulation, as in the case of dinucleotide repeat polymorphisms in the transcription region of several cytokines, as well as mapping and linkage analysis.
Variable number tandem repeats (“VNTRs”) represent one type of repeat length polymorphism (See Nakamura Y., et al. (1987), Science 235: 1616–1622; U.S. Pat. Nos. 4,963,663; 5,411,859) resulting from the insertion, in tandem, of multiple copies of identical segments of DNA, known as minisatellites, typically 10 bp to 100 bp in length. VNTR markers are highly polymorphic, in fact more so than base substitution polymorphisms, sometimes displaying up to forty or more alleles at a single genetic locus. The determination of the number of repeats in VNTRs would provide a means for identity typing but for the fact that there are few fast and accurate methods for this purpose. The commonly used method involves enzymatic digestion of the VNTR-containing nucleic acid segment, followed by Southern blotting, a labor-intensive and time-consuming procedure. Alternative methods invoking the polymerase chain reaction (PCR) (U.S. Pat. No. 4,683,202) are of limited utility in the analysis of VNTRs because of PCR's shortcomings in reliably amplifying segments exceeding 3,000 bases in length. Only a few amplifiable VNTRs have been developed, making them, as a class, impractical for linkage mapping and identity typing.
More frequent, and more polymorphic than VNTRs are microsatellite loci, consisting of repeating units typically comprising only a few bases. Short tandem repeats (“STRs”) represent an example of “microsatellite” markers. As with amplifiable VNTRs, alleles of microsatellite loci differ in length, but in contrast to VNTRs contain, in the case of STRs, only two to seven perfect or imperfect repeated sequence elements displaying two, three, four or rarely five bases. Available amplification protocols produce small products, generally from 60 to 400 base pairs in length, permitting the determination of STR repeat numbers by means of amplification followed by gel analysis. Care must be taken in selecting PCR primers in order to eliminate amplification products containing alleles of more than one locus.
Hybridization, widely used for the analysis of polymorphisms generally, also has been used for the analysis of variations in the number of repeats. Hybridization-mediated analysis of point mutations—i.e., base substitution—as well as deletions and insertions, typically involves a pair of allele specific oligonucleotides (ASOs) of which one is designed to be complementary to the normal (“wild type”), and the other is designed to be complementary to the variant (“mutant”) sequence. However, in “multiplexed” configurations, calling for the concurrent analysis of multiple polymorphisms, cross-hybridization often limits the reliability of the analysis.
One hybridization-mediated method of analyzing variations in the number of repeats requires a large number of allele-specific oligonucleotides, each such ASO matching in length one of the alleles. For example, U.S. Pat. No. 6,307,039 discloses a method wherein ASOs are provided so as to permit template-mediated probe extension if, and only if the number of repeats in the probe sequence is equal to or less than the number of repeats in the target sequence. Using a set of such ASO probes, the set containing at least one probe for each anticipated configuration of target repeats, the number of target repeats can be determined by monitoring the outcome of the extension reaction for all probes so as to identify that probe in the set whose repeat count matches that of the target.
This approach has several disadvantages which seriously impair its practical utility. First, in order to eliminate errors in the measurement of repeat length due to “slippage”, that is, shifts in probe-target alignment, probes must contain an “anchoring” sequence of sufficient length to ensure predictable alignment with the flanking sequence located upstream from the target repeat. Second, target and probes of increasing lengths form duplexes which contain increasing numbers of repeats and thus display widely varying thermodynamic stabilities, a feature which renders an isothermal assay protocol impractical and instead requires careful real-time temperature control. Third, a probe must be provided for each possible target polymorphism, a requirement that implies large probe sets and complex assay protocols and hence considerable cost. For example, a typical application such as the implementation of a 13-marker STR panel commonly used for forensic analysis, will require of the order of ˜100 probes. In cases such as Huntington's disease, characterized by up to forty or more triplet repeats, each in the large set of requisite probes must be precisely aligned with the target by way of a long anchoring sequence, implying an assay design of considerable complexity. Finally, this approach does not accommodate cases involving polymorphisms with an unknown range of repeats such as the FGA marker commonly employed for parentage analysis.
A method of concurrent analysis of multiple tandem repeats, invoking an array of a minimal number of probes, regardless of the possible number, known or unknown, of target repeats, while simplifying the assay design, for example by eliminating the requirement for an anchoring sequence, clearly would be desirable.