1. Field of the Invention
The invention relates to polynucleotides which can be labelled to serve as probes useful in probing the human or animal genome, and to a method of identifying genomic DNA using such probes. The method of identification is useful, for example, in paternity and maternity testing, forensic medicine and in the diagnosis of genetic diseases,and cancer.
2. Description of the prior art
The main prior method of identifying genetic variation in genomic DNA is by detecting restriction fragment length polymorphisms (RFLPs). See, for example, the identification of the locus of the DNA defect responsible for Huntington's chorea disease, by J. F. Gusella et al., Nature 306, 234-238 (1983), and the analysis of pre-disposition to retinoblastoma by W. K. Cavanee et al., Nature 305, 779-784 (1983).
Most RFLPs result from small scale changes in DNA, usually base substitutions, which create or destroy specific restriction endonuclease cleavage sites. Since the mean heterozygosity of human DNA is low (approximately 0.001 per base pair), restriction endonucleases will seldom detect a RFLP at a given locus. Even when detected, most RFLPs are only dimorphic (presence and absence of a restriction endonuclease cleavage site) with a heterozygosity, determined by allele frequencies, which can never exceed 50% and which is usually much less. As a result, all such RFLPs will be uninformative in pedigree analysis whenever critical individuals are homozygous.
Genetic analysis could be considerably simplified by the availability of probes for hypervariable regions of DNA which show multiallelic variation and correspondingly by high heterozygosities. The first such region was isolated by A. R. Wyman et al., Proc. Nat. Acad. Sci. U.S.A. 77, 6754-6758 (1980), by chance from a library of random segments of human DNA. The structural basis for multiallelic variation at this locus is not yet known. Subsequently, and again by chance, several other highly variable regions have been discovered near the human insulin gene, [G. I. Bell et al., Nature 295, 31-35 (1982)], zeta-globin genes [N. J. Proudfoot et al., Cell 31, 553-563 (1982) and S. E. Y. Goodbourn et al., Proc. Nat. Acad. Sci. U.S.A. 80, 5022-5026 (1983)] and c-Ha-ras-1 oncogene [D. J. Capon et al., Nature 302, 33-37 (1983)]. In each case, the variable region consists of tandem repeats of a short sequence (a "minisatellite") and polymorphism is due to allelic differences in the number of repeats, arising presumably by mitotic or meiotic unequal exchanges or by DNA slippage during replication. The resulting minisatellite length variation can be detected using any restriction endonuclease which does not cleave the repeat unit.
The present inventor and his colleagues have previously described a short minisatellite comprised of four tandem repeats of a 33 bp sequence in an intron of the human myoglobin gene, see P. Weller et al., EMBO J. 3, 439-446 (1984). It was noticed that the 33 bp repeat showed weak similarity in sequence to the above-mentioned other human minisatellites previously characterised. The paper speculated that the minisatellite regions might arise by transposition. If the 33 bp repeat in the human myoglobin gene were transposable then it might provide a probe for tandem repetitive regions of the human genome which are frequently associated with multiallelic polymorphism due to repeat number variation.
3. Additional, Unpublished, Background Information
Human genomic DNA was probed with a DNA probe comprising tandem repeats of the 33 bp sequence from the myoglobin gene. Polymorphic variation was observed at several different regions in the genomic DNA of 3 individuals (father, mother and daughter), the variation occurring in the size of larger fragments (2-6 kb). The data were consistent with stably inherited polymorphism due to length variation of more than one minisatellite regions.