Population variation occurs by processes such as recombination and mutation. Variation can be found at the level of the organism, cell or cell component. Biological redundancy favors variation absent detriment to survival. For example, divergence of related nucleic acid sequences does not imply variability of the polypeptides those sequences encode.
The state of variability, particularly with reference to a protein or a nucleic acid, is known as polymorphism. Until recently, study of polymorphism focused on cell membrane proteins, serum-borne proteins or intracellular proteins. Applications include blood group typing, for example ABO or HLA, and isozyme analysis (Lewontin & Hubby, Genetics (1966) 54, 595). Those methods rely on amino acid changes that alter, for example, an antigenic determinant or molecular charge. But protein polymorphism may be `silent` because the amino acid substitution does not cause a change that is recognizable by those methods. That could occur if the substitution occurs at a site that is normally unexposed in the tertiary or quarternary structure of the protein.
Those limitations have been circumvented with the observation of polymorphism in nucleic acids. The eukaryotic genome is comprised of three classes of sequences--unique, moderately repeated and highly repeated. Generally, the three classes are interspersed throughout the genome. Many of those sequences are of unknown function and may not encode requisite RNAs or proteins. Of those sequences known to encode proteins, most are composites of coding and noncoding segments. Nucleic acid variation can occur for example because of codon degeneracy or in areas where lack of selection pressures permits a high degree of polymorphism as in pseudogenes, highly repeated sequences or in noncoding regions of structural genes.
Nucleic acid polymorphism may be detectable as single base changes within restriction sites. Wyman and White (Proc Natl Acad Sci USA (1980) 77, 6754) found a locus in the human genome in which polymorphism resulted from DNA rearrangement. Subsequently, it was determined that the mechanism responsible for the polymorphism was varying number of a repeat sequence at that locus. Other sites where polymorphism is due to copy number differences were found near the .alpha. globin genes (Higgs et al., Nucl Acids Res (1981) 9, 4213), insulin gene (Bell et al., Nature (1982) 295, 31), zeta-globin genes (Proudfoot et al., Cell (1982) 31, 553; Goodbourn et al., (Proc Natl Acad Sci USA (1983) 80, 5022), c-Ha-ras-1 locus (Capon et al., Nature (1983) 302, 33) and myoglobin gene (Weller et al., EMBO J (1984) 3, 439).
Glassberg (GB2135774A) disclosed a method for genetic identification of individuals by analysis of DNA length polymorphism detected with clones developed by Wyman & White, supra, and Capon et al., supra. With regard to the Wyman & White clone, the repeat unit is 3bp which increases the likelihood of cross reactivity and the size of genomic fragments detected are 14kb or larger. Size differences of large fragments are sometimes difficult to resolve in agarose gel electrophoresis. The fragment sizes detected with the Capon clone are smaller but there are few alleles at that locus and each of a third of the alleles is present at less than 1% in a population of 268 random individuals. (The value of a locus for identification purposes is related directly to the number of alleles and the frequency of each allele in the population.)
Jeffreys (GB2166445A) found that many polymorphic regions have a sequence in common, which he called the core sequence. He constructed concatamers of core or quasi-core sequences which were then cloned. When used as probes of genomic DNA, those clones hybridized to a plurality of fragments. The difficulty associated with those clones is interpretation of the complex hybridization pattern.
The utility of clones that detect highly polymorphic loci is evidenced by papers describing their applicability in the study of cancer (Thein et al., Br J Cancer (1987) 55, 353); in forensic medicine (Gill et al., Electrophoresis (1987) 8, 38; Kanter et al., J For Sci (1986) 31, 403; Giusti et al., J. For Sci (1986)31, 409); in zygosity determination (Motomura et al., Jpn J Hum Genet (1987) 32, 9; Jones et al., Eur J. Haematol (1987) 39, 144; Hill et al., Lancet (1985) ii, 1394); detection of chimerism (Wallace et al., Cold Spring Harb Symp Quant Biol (1986) 51, 257; Knowlton et al., Blood (1986) 68, 378); in veterinary medicine (Morton et al., Vet Rec (1987) 121, 592; Jeffreys et al., Anim Genet (1987) 18, 1); in population studies (Wetton et al., Nature (1987) 327, 149); in linkage studies (Ponder et al., Henry Ford Hosp Med J (1987) 35, 161; Matthew et al., J Med Genet (1987) 24, 524); and for paternity testing (Jeffreys et al., Nature (1985) 316, 76; Jeffreys et al., Nature (1985) 316, 76).
Donis-Keller et al. (EP 0221633) disclosed methods and compositions useful for genotyping by analyzing RFLPs. The claimed clones carry unique sequence inserts and hybridize to unlinked loci. The clones fall into two categories. The first consists of clones that reveal base changes within a restriction site, and the second consists of clones that yield fragment length differences with at least three enzymes.
The above-described art suffer shortcomings. Generally, the clones hybridize to multiple fragments per sample. That complicates interpretation. Some clones arose from loci that are not highly multiallelic or many of the alleles are rare. That reduces their usefulness. Some of the clones hybridize with genomic fragments that tend to be large and difficult to resolve using standard filter hybridization. That demands genomic DNA with minimal degradation. Furthermore, assay conditions must be specific to minimize spurious cross-reactivity. Finally, each laboratory has a small set of clones to work with. Any one clone may be uninformative in a specific case. Moreover, in order to rule out variability due to de novo mutation, it is important to analyze more than one locus. It is preferable that the succeeding loci analyzed be unlinked to the first locus. Ideally, all loci analyzed are on different chromosomes. Thus, an improvement on the prior art would be a set of single locus clones consisting of a larger number of clones that detect multiallelic, heterozygous loci and are mapped.
A modification was disclosed by Jeffreys (GB2188323A) where he excised a band from a gel, cloned and used that insert as probe. That clone hybridizes to a single multiallelic locus and yields a blot that is easier to interpret. Jeffreys also screened a size-restricted library for recombinants carrying `core` sequences. Inserts from 5 positive plaques hybridized under high stringency to unique loci and provided blots that were easily interpretable. Nevertheless, the genomic fragments recognized were often large leading to problems with resolution and DNA quality.
Nakamura et al. (Science (1987) 235, 1616) named a genetic sequence that contains tandem repeats at a single locus a variable number of tandem repeats (VNTR) locus. They used oligonucleotides to screen a cosmid library for recombinants carrying VNTR sequences. Positive clones were used as probe to determine whether they hybridize to VNTR loci in the human genome, and if so, to assess the degree of heterozygosity at those loci.