The ability to detect differences between two populations of nucleic acid sequences is important to characterizing the molecular basis of various pathological states, for example neoplasia, infectious and degenerative diseases, viral infections and hereditary predisposition to disease. Increasingly, the technique of subtractive hybridization is being used to identify polynucleotides comprising sequences that are present in a first population of nucleic acid sequences but absent, present in a different concentration, or arranged differently in a second population.
Sargent and Dawid, Science 222:135-139 (1983) used subtractive hybridization to isolate cDNAs representing mRNA molecules preferentially expressed at the gastrula stage of development of the frog embryo. Gastrula cDNA was hybridized to RNA from unfertilized eggs and the cDNA that failed to hybridize was cloned. These cloned sequences represented mRNAs that were differentially expressed in the frog gastrula. Similarly, Hedrick et al., Nature (London) 308:149-153 (1984) cloned a T-cell receptor molecule by hybridizing cDNA from antigen-specific T-cells with RNA from B-cells and collecting the non-hybridized cDNA. Despite these early successes, it soon became evident that this method is limited in practice to detection of differentially-expressed mRNA representing 0.01% or more of the total mRNA population. Furthermore, in cases where the method is practical, selection of the differentially expressed cDNA (as single-stranded material) is achieved by hydroxyapatite chromatography, which is cumbersome and results in losses of valuable material. Finally, this technique did not provide a method to detect differences in genome organization, such as deletion, gene amplification, or rearrangement.
Adaptations of the subtractive hybridization technique have been developed which allow the identification and isolation of polynucleotides representing sequence differences between different genomes. Lamar and Palmer, Cell 37:171-177 (1984) used a selective cloning approach to isolate Y chromosome-specific sequences in the mouse. Hybridizations were conducted using restriction enzyme-digested male DNA as tracer and sonicated female DNA as driver. Of the duplexes obtained after annealing, only those with both strands derived from male DNA contain sequences unique to the Y chromosome and possess a restriction enzyme recognition site at each end. Such duplexes were cloned preferentially into a vector containing compatible restriction enzyme-generated ends.
Kunkel et al., Proc. Natl. Acad. Sci USA 82:4778-4782 (1985) and Nussbaum et al., Proc. Natl. Acad Sci USA 84:6521-6525 (1987) described the isolation of fragments containing sequences deleted from the human X chromosome by hybridization of restriction enzyme-digested DNA from cells that were polysomic for the X chromosome with an excess of sheared DNA from cells harboring one or more X chromosome deletions, using conditions in which the rate of reassociation was enhanced. Selective cloning using a vector with compatible restriction enzyme-generated ends was used for the isolation of sequences absent in the X chromosome deletions.
Strauss and Ausubel, Proc. Natl. Acad. Sci. USA 87:1889-1893 (1990) described a technique for isolating a polynucleotide comprising DNA that is absent in a yeast deletion mutant. In this method, denatured wild-type DNA is allowed to anneal with biotin-labeled DNA from the deletion mutant, and biotin-containing duplexes (which contain sequences common to the mutant and wild-type) are removed from solution by binding to avidin-coated beads. The process is repeated for several cycles, with addition of fresh biotinylated wild-type DNA to the mutant DNA remaining unbound at the end of each cycle. Finally, single-stranded material is amplified by a polymerase chain reaction to generate a probe enriched in sequences missing in the deletion mutant. Of course, this method can only be used to isolate a genomic region that is defined by a deletion mutant, and its applicability to genomes more complicated than that of yeast has not been tested. A similar procedure using biotin-based separation for isolation of differentially expressed cDNAs was described by Lebeau et al., Nucleic Acids Research 19: 4778 (1991).
Wieland et al., Proc. Natl. Acad. Sci. USA 87: 2720-2724 (1990) described a method for isolating polynucleotides comprising sequences present in a "tester" DNA population that are absent in a "driver" population. In this method, the tester DNA is labeled with biotin, then subjected to several rounds of hybridization with excess driver DNA. After each round, single-stranded DNA is collected by hydroxyapatite chromatography. After the final round, the small amount of nonhybridized biotinylated DNA (unique to the tester population) is purified by avidin affinity chromatography, amplified by a polymerase chain reaction and cloned to generate a probe for sequences unique to the tester population.
Recently, a technique known as Representational Difference Analysis (RDA) has been developed, which allows the isolation of DNA fragments that are present in one population of DNA sequences but absent in another population of DNA sequences. Lisitsyn et al., Science 259:946-951 (1993); Lisitsyn et al., Meth. Enzymology 254:291-304 (1995); U.S. Pat. No. 5,436,142; U.S. Pat. No. 5,501,964; Lisitsyn et al., Nature Genetics 6:57-63 (1994). This method allows one to search for fragments present in a "tester" population of DNA sequences that are not present in a related "driver" population. Such unique fragments are denoted "target" sequences. In the first step of RDA, "representations" of both populations are obtained. These representations consist of lower-complexity subsets of the original sequence populations. In the most widely-practiced embodiment of the technique, a representation is obtained by separately subjecting both populations to digestion with a restriction endonuclease, ligating a first set of adapters to the ends of the fragments so generated, and amplifying by a polymerase chain reaction (PCR) using primers complementary to the first set of adapters, under conditions in which only relatively short fragments (less than 2 kilobase pairs) are amplified. The first adapters are then removed from the amplified fragments of both populations by restriction enzyme digestion and a second set of adapters (having a different sequence than the first set) is attached, by ligation, to amplified fragments from the tester DNA population only.
The adapter-containing amplified fragments from the tester population are then combined with an excess of amplified fragments from the driver population, (which lack adapters) and the mixture is incubated under denaturing and annealing conditions, followed by another round of PCR amplification using primers complementary to the second set of adapters. During the annealing step, several types of duplex will be formed. Because driver fragments are present in excess, the vast majority of fragments containing sequences common to both tester and driver populations will form either driver-driver duplexes (containing no adapter) or tester-driver duplexes (containing a single adapter on the strand derived from the tester fragment). Fragments containing sequences that are unique to the tester population are capable of self-annealing to generate duplexes possessing an adapter at each end. Consequently, during the PCR step subsequent to annealing, tester-tester duplexes will be amplified exponentially. On the other hand, tester-driver duplexes, possessing only a single adapter, will be amplified in a linear fashion and will thus come to form only a small fraction of the population of amplified sequences. Driver:driver duplexes, lacking adapters, will not be amplified at all. Selective amplification of fragments containing target sequences is thus achieved by virtue of the fact that, prior to annealing, only fragments from the tester population possess adapters, bestowing tester-tester duplexes with the potential for exponential amplification.
The steps of removing the adapters present on the enriched target fragments obtained from a previous step, adding new adapters, incubating under denaturing and annealing conditions with an excess of fragments from the driver population, and amplification by PCR is repeated until a desired degree of enrichment is attained.
An adaptation of RDA called cDNA-RDA has recently been described, Hubank & Schatz Nucleic Acids Research 22:5640-5648 (1994), in which two populations of cDNA are compared for the presence a cDNA fragment representing either a mRNA unique to one of the two populations or a mRNA that is differentially expressed in the two populations. cDNA-RDA differs from the original RDA protocol in the following respects. 1) Since the complexity of the mRNA population of a typical mammalian cell is only .about.1-2% of genome complexity, generation of a representation is not required for the practice of cDNA-RDA. Hence, a more complete analysis of differences can be obtained in a single experiment. 2) Amplification of fragments already known to differ between the two populations can be minimized by addition of such fragments to the driver. 3) Amplification of fragments representing mRNAs present at different levels in the two populations (rather than absent in one population) can be achieved by depleting the populations of low-abundance sequences (by hybridization to low C.sub.o t) prior to amplification, and lowering the ratio of driver to tester during hybridizations subsequent to the generation of the first difference product. This effectively converts an up-regulated sequence into a unique sequence, for the purposes of the assay. A limitation of cDNA-RDA is the inability to detect differences due to point mutations, small deletions or small insertions, unless they affect a particular restriction enzyme recognition site. cDNA-RDA has been used to detect transcripts of a transfected gene in cultured cells and to clone cDNAs representing genes whose transcription is up-regulated in response to an environmental stimulus.
RDA and cDNA-RDA depend upon selective amplification for enrichment of polynucleotides containing sequences unique to, or enriched in, a particular nucleic acid sequence population. Selective amplification of unique sequences was combined with the selective degradation of sequences common to both populations in the technique of enzymatic degrading subtraction. Zeng et al., Nucleic Acids Research 22:4381-4385 (1994); U.S. Pat. No. 5,525,471. In this procedure, the ends of the amplified cDNA fragments comprising the tester population are blocked by the enzymatic addition of .alpha.-phosphorothioate-modified nucleotides. Hybridization of blocked tester fragments with an excess of unblocked driver fragments is then conducted under conditions that accelerate the annealing rate, allowing the use of relatively low driver concentrations. After hybridization, treatment with exonuclease III (a double strand-specific nuclease which attacks from the 3' end) and exonuclease VII (a single strand-specific nuclease) will destroy driver-driver and tester-driver duplexes. However, the phosphorothioate-blocked ends of tester-tester hybrids will render these duplexes resistant to the combined nuclease treatment. Tester-tester duplexes which survive nuclease treatment undergo a second round of subtraction and are then amplified by a polymerase chain reaction. Additional rounds of subtraction and amplification may be conducted, as necessary.
As the technique of RDA has come to be practiced more widely in recent years, several disadvantages have become apparent. A major problem results from the inefficiency of the multiple restriction digestion and ligation reactions that are utilized in the technique. Lack of complete restriction digestion will lead to incomplete removal of the first set of adapters from the tester fragment population, resulting in an inability to attach the second set of adapters. Similarly, an inefficient ligation step would lead to incomplete attachment of the second set of adapters, even at sites from which the first set had been removed. Since the amplification primers are complementary to the second set of adapters, incomplete attachment of the second adapter set will reduce the degree of amplification of target sequences that can be achieved. In addition, the necessity to process samples through multiple steps, and possibly purify material between steps, leads to losses of already-scarce experimental material. One possible consequence of inefficient restriction digestion and/or ligation is the generation of false positives, wherein the loss of a particular driver sequence, through failure to be amplified, leads to the inappropriate identification of its complement in the tester as a target sequence.
Another disadvantage of RDA as it is commonly practiced stems from the use of a large number of polymerase chain reaction cycles during the amplification step. Typically, 20 cycles of PCR are used to generate the representations and 25-30 cycles of PCR are used during each hybridization/amplification round of RDA. If, as is common, three rounds of hybridization/amplification are conducted, target nucleic acids will have undergone 95-110 rounds of amplification by the time they are isolated. Additional rounds of amplification are commonly used to clone and sequence the difference product isolated by RDA. It has been known for some time that, at high cycle numbers of a PCR amplification, a "plateau effect" is observed. Innis and Gelfland in "PCR Protocols: A guide to methods and applications" ed. Innis et al., Academic Press (1990) pp. 3-12. This effect is characterized by a decline in the exponential rate of accumulation of amplification product that occurs during late cycles. Potential causes of the plateau effect include 1) depletion of substrates, 2) loss of activity of enzyme, 3) degradation of substrates, 4) end-product inhibition, 5) competition for reactants by nonspecific products, 6) incomplete denaturation of product at high product concentration and 7) reannealing of product at high product concentration (which may block primer annealing and/or extension).
These last two features of the later cycles of a polymerase chain reaction are especially important for RDA and related techniques because, besides leading to less-than-exponential amplification, they also result in a skewing of the representation of products in reactions, such as RDA, in which multiple fragments are being amplified. In particular, Mathieu-Daude et al., Nucleic Acids Research 24:2080-2086 (1996) have shown that, in later cycles, the rate of amplification of abundant products decreases more rapidly than that of less abundant products in the same reaction. This is due to preferential reannealing of the more abundant products, which prevents primer binding and/or extension for these abundant species. This phenomenon is consistent with the fact that rate of annealing is proportional to the concentrations of the reacting strands. The consequence of this effect for the practice of cDNA-RDA is that the ability to detect mRNAs present in different concentrations in two populations (as opposed to mRNAs that are unique to one of the populations) will be minimized for mRNAs whose cDNAs are present at high concentrations in the starting population.
A further potential source of artifact in the current procedure for RDA is the utilization of ten cycles of PCR immediately following the first hybridization step. Only after these ten amplification cycles have been conducted is the material treated with nuclease to degrade unhybridized material. This sequence of events has the potentially undesirable effect of subjecting tester:tester duplexes (i.e., the desired product) to ten denaturation steps, with the attendant risk that some of these duplexes will fail to reform, due, for example, to degradation of their constituent strands while in the denatured state.
Finally, the presence of excess driver DNA during the ten PCR cycles prior to nuclease treatment can result in a reduced efficiency of amplification of tester:tester hybrids, due to the potential for the residual driver:driver and driver:tester duplexes to act as a sink for primers, substrates, counterions and enzyme.
In the practice of the present invention, these disadvantages are surmounted by methods that use fewer PCR cycles, nuclease digestion prior to amplification, and a single adapter designed for use with multiple primers. Additional advantages are also presented by the invention, as set forth infra.