This invention is generally related to a method of characterizing macromolecules composed of complementary strands. More specifically the invention concerns a method for subtractive comparisons of populations of representative fragments (hereinafter PRFs) representing two related complex macromolecules such as genomic DNA and RNA and partial purification of polymorphic PRF components in which the two macromolecules differ.
Polymorphisms are genetic differences between two related genomes which are inheritable and contribute to the diversity within a species. They correspond to subunit structural differences in the DNAs (or RNAs) which encode the genome. Many DNA polymorphisms are without manifest physiological effects, while others are causal factors for inherited traits, whether the effects be positive, neutral or causative for genetic disease. Therefore, the isolation of fragments of the total genomic DNA which represent polymorphism sites is an important task of biological and medical research. For medical genetics, these fragment isolations constitute one step in the development of capacities to diagnose genetic diseases. More generally, it is a common constituent of biological research programs to isolate genes and characterize their functions.
Previously, the detection of genetic differences in genomic DNA and the isolation of genes has been limited by the complexity of genomes which could be analyzed by conventional procedures without resorting to laborious comparative probing techniques. The following is a discussion of some of those procedures and their drawbacks.
Subtraction hybridization was one of the first approaches used in the isolation of genes or their corresponding RNA. This process relies on the duplex or double stranded structure of DNA and RNA/DNA hybrids. DNA duplexes can be denatured, i.e., separated into their complementary strands by treatment with heat or with destabilizing agents, such as a formamide or a high pH solution. Annealing conditions can be established under which strands pair up and reform duplexes. The stability of the duplexes is highly dependent on proper pairing of constituent bases across the strands. The four constituent bases found in DNA molecules are adenine, thymine, guanine, and cytosine (hereinafter, abbreviated A, T, G, and C, respectively). Proper subunit pairings across the strands are A with T and G with C. In the first subtraction hybridization experiments, viral subject DNA and host cell DNA were utilized. The viral component of the total RNA extracted from the virus infected cells was selectively bound to viral, but not to host cell DNA. (Bautz and Hall, The Isolation of T4-Specific RNA on a DNA-Cellulose Column, 48 Pro. Nat. Acad. Sci. 400 (1962)). Hybridization will occur between two complementary single strands even if one of the strands is stably attached to a matrix.
During conventional subtraction hybridization, DNAs of a subject genome and a related reference genome are utilized. The duplex DNAs of both are fragmented and then denatured. Fragmented reference strands are bound to a matrix, such as agarose, cellulose or nylon. Fragmented subject strands are annealed with a large molar excess of the bound reference strands. During the annealing process, most of the subject strands pair with reference complements and are entrapped in hybrid duplexes of subject and reference strands. Subject strands without reference complements cannot pair off in a stable duplex with, and thereby be entrapped by, the reference DNAs. After the annealing step, removal of the matrix eliminates reference DNAs and the entrapped homologous subject DNAs. The free subject DNAs are comprised of the sought unique subject DNAs and common DNAs which have escaped entrapment in hybrid duplexes. The former comprise a much greater proportion of the free DNAs than they did of the input subject DNA population, since the majority of the common DNAs have been subtracted out. The net subtraction hybridization process thus provides a partial purification for the sought unique DNAs lacking reference complements.
The extent of elimination of the unwanted subject DNAs during a conventional subtraction hybridization process depends on the molar ratio of the input materials. The annealing of strands into duplexes is a bimolecular reaction obeying conventional chemical mass action laws. With an input ratio of one subject DNA to ten reference DNAs, the annealed products are in the ratio of 0.1 (subject): 2 (hybrid): 9 (reference duplexes). Thus, with respect to the input subject DNA population, the elimination of matrix bound DNAs eliminates 90% of the subject DNA with reference homologies.
Conventional subtraction hybridization technology has limited applicability, i.e., polymorphisms corresponding to deletions in the genomes of simple organisms such as viruses and bacteria. The technique fails for point mutations and rearrangement polymorphisms. The subject DNA polymorphisms being sought still have homologies with the reference DNAs, and would consequently be entrapped and eliminated during a subtraction hybridization procedure. Moreover, genomic DNA of higher species contains numerous base pair sequences which are repeated and dispersed throughout the chromosomes. For example, about 80% of human genomic DNA is comprised of several families of repeated (reiterated) DNA sequences, the largest families having hundreds to thousands of copies. Single copy genes or sequences comprise the remaining 20% of human genomic DNA. The reiterated sequences cause an undesirable complication. During an annealing of DNA strands of a complex genome, the reiterated sequences make more rapid contacts than the much lower concentration single copy sequences. Consequently, reiterated regions form stable duplex regions, regardless of non-homology between adjacent single copy gene regions. As a result, extended "promiscuous" tangles of DNA form that are stabilized by the duplex regions. The formation of promiscuous tangles hinders the purification in conventional subtraction hybridization.
Alternative approaches to conventional subtraction hybridization utilize restriction nucleases. A restriction nuclease is an enzyme that has the capacity to recognize a specific target sequence, several base pairs in length in double-stranded DNA molecules, and to cleave both strands of the DNA molecule at the locations of target sequence. The DNA molecules defined by digestion with a restriction nuclease are referred to as restriction fragments. Any given genomic DNA digested by a particular restriction nuclease is converted into a discreet PRF.
A restriction fragment length polymorphism (hereinafter, RFLP) is a particular type of polymorphism manifested as a difference in the lengths of some genetically related fragments of the two PRFs compared. The underlying genetic manifestations can be as subtle as a single base pair change, which creates or eliminates a cleavage site, or as gross as a genetic deletion which changes the length of DNA between cleavage sites. To detect a RFLP, an analytical method for fractioning double-stranded DNA molecules on the basis of size is required. The most commonly used technique for achieving such a fractionation is agarose gel electrophoresis. In that method DNA molecules migrate through the gel which acts as a sieve that retards the movement of the largest molecules to the greatest extent and affects the movement of the smallest molecules to the least extent. A comparison of gel electrophoretically fractionated PRFs reveals the fragments unique to each genome among those common to the subject and reference PRFs compared. The unique fragments represent the RFLP. Fractionated PRFs can also be denatured and annealed within the confines of the fractionation gel. Such in situ annealings have been employed previously, in a strategy to selectively detect reiterated PRF members. (Roninson, Detection and mapping of homologous, repeated and amplified DNA sequences by DNA renaturation in agarose gels, 11 Nucleic Acids Res. 5413-31 (1983)).
Fractionations which distinguish compared DNAs by the stability of the base pairing have also been used (Fischer and Lerman, Length-Independent Separation of DNA Restriction Fragments in Two-Dimensional Gel Electophoresis, 16 Cell 191-200 (Jan. 1979)). They can reveal some polymorphisms between DNAs of the same length.
So long as a fractionation procedure can resolve the constituents of each PRF, differences between PRFs are easily detectable. For example, desired resolution can be achieved with one dimensional fractionations for many viral PRFs, or with two dimensional fractionations responsive to fragment length and thermal stability, for bacterial PRFs. However, for higher organisms, even if the best fractionation techniques are used, resolution of the sought polymorphic PRF constituents is not achieved. With such higher organisms, separation of any single member from the majority of the PRF membership occurs, but there are so many members that there is a continuum of overlapping fragment bands which prevents resolution and detection of members within the continuum.
When there is a continuum of fragment bands, probing techniques have been used to display positions of particular genes. A cloned form of the gene which is sought is given a radioactive or biochemical label that can be later employed to reveal its position. It serves as a probe to locate its homologues. The fractionated subject DNAs are denatured into constituent strands and then transferred and stably bound to a membrane, e.g., blotted onto a stable membrane. Single stranded probe and blotted subject DNAs are then annealed. The probe binds in a stable manner by base pairing, only at the position of its genetic homologues, and the positions of homologous fragments on the blot, are thereby detected. With most single gene probes, the compared PRFs show no differences for the fragments selectively displayed. Nevertheless, laborious comparative probings of related PRFs can be sequentially performed and with a large enough population of probes, polymorphisms useful for genetic diagnostic purposes can eventually be detected. (Gusella et al., A Polymorphic DNA Marker Genetically Linked to Huntington's Disease, 306 Nature 234 (1983)).
Another technique has been used to selectively display a sub-population of polymorphisms of viral genomic DNA. In this technique, the PRFs of the genomic DNA of two genomes to be compared are prepared. They are pooled in equal amounts and hybridized. Hybridization products are then treated with nuclease Sl which cleaves at distortions in DNA duplexes. (Shenk et al., Biochemical Method for Mapping Mutational Alterations in DNA with S1 Nuclease: The Location of Deletions and Temperature-Sensitive Mutations in Simian Virus 40, 72 Proc. Nat. Acad. Sci. 3:989-993 (1975)). Some hybrid duplexes comprised of polymorphic DNA strands have a sufficient degree of distortion and are consequently cleaved at these sites. Secondary fragments thus generated are detected through a fractionation, during which S1 cleavage fragments migrate faster than intact predecessor fragments. It is essential for this distortion cleavage technique that a control consisting of a hybridization of each PRF against itself is conducted for comparative analysis of the products. Such controls do yield secondary S1 fragments which arise because of partial homologies and reiterated sequences within the genomic DNA. The fragments encoding them form distorted duplexes and partially duplex complexes during hybridizations. Thus, secondary fragments arise during the S1 nuclease digestion. These secondary fragments must be identified in order to distinguish polymorphisms between genomes from internal homologies within a genome. This distortion cleavage technology has also been used with bacterial genomic DNA. (Yee and Inouye, Two-dimensional S1 nuclease heteroduplex mapping: Detection of rearrangments in bacterial genomes, 81 Proc. Nat. Acad. Sci. 2723-2727 (1984)). Internal homologies are a small fraction of the total genomic DNA in bacterial genomes and are identifiable from the control. By contrast, internal homologies are extensive in the genomic DNA of higher organisms. As a consequence when this technique is used with PRFs of higher organisms, the sought polymorphisms are obscured by the great abundance of secondary fragments arising as a consequence of the extensive internal homologies.
Other polymorphism identification techniques have been used with a very limited domain of utility. These are techniques which require the prior cloning of the genome fragment whose polymorphisms will subsequently be sought. The most refined of these methodologies is comparative nucleotide sequencing, through which the particular subunit differences of the polymorphism are identified.