1. Field of the Invention
The present invention relates to methods and kits for generating or analyzing nucleic acid populations or desired nucleic acids based upon replication or amplification reactions.
2. Description of Related Art
In the related methods discussed herein, the nucleic acid sample containing or believed to contain one or more desired nucleic acids or the target nucleic acids is often referred to as the "tester" or "tracer." The nucleic acid sample which is believed to specifically lack the desired nucleic acid is referred to as the "driver." The reaction products of an amplification reaction are sometimes referred to as "amplicons" or "representation." Hybrids formed after combining the tester and driver nucleic acid samples are denoted, for example, "driver:driver," "tester:driver," and "tester:tester." In this disclosure, nucleic acids refer to DNA, cDNA, RNA, and mRNA molecules, combinations thereof, or the like from any source, with or without modified nucleotides and nucleotide analogs.
Scientists have employed procedures to separate and identify nucleic acid molecules from different sources for a variety of purposes. Nucleic acid libraries, for example, possess all or part of the genome or expressed sequences from a particular biological source. Certain libraries have proven valuable tools in deciphering the importance of, characterizing, and isolating nucleic acid sequences. However, isolating a desired clone from a library containing a million or more clones is both time consuming and labor intensive. The ability to extract a desired clone from a library involves selection of an appropriate library. For cDNA libraries, the source of the nucleic acids used to construct the library should contain the most copies of the desired clone compared to other available libraries. The number of copies of a desired clone in a library can be predicted by the characteristic expression of the desired nucleic acid in a biological source. Certain libraries, therefore, are enriched for the presence of certain nucleic acid sequences because of the source of the nucleic acids used to construct the library. Another way to enrich for rare or desired nucleic acid species is to construct normalized or subtracted libraries.
Subtraction libraries are produced through methods generally referred to as subtractive methods or subtractive hybridization methods. These methods generally rely on the hybridization of complementary nucleic acid strands from differing sources. A first source of nucleic acids (tester nucleic acids) is believed to comprise desired or target nucleic acids. A second source of nucleic acids (driver nucleic acids), generally the closest phenotype to the first source presumed to lack the desired or target nucleic acids, is used to "subtract" common and undesired nucleic acids through hybridization, followed by a separation of the hybridized nucleic acids from the non-hybridized nucleic acids. Thus, the subtraction library allows one to identify desired nucleic acids by their differential abundance or presence in two cells or cell types.
There are numerous examples of the power of subtraction libraries. One skilled in the art can isolate sex-specific genes or DNA fragments (1); detect differences in gene expression; and detect differential gene expression in cells at different developmental stages, for example, distinguishing differentiated cells from stem cells, activated cells from their resting counterparts, and mutant cells from normal cells. For example, Lamar and Palmer (1) described a subtractive hybridization method to enrich for Y-chromosome specific genomic DNA. In this method, female genomic DNA (containing X chromosome DNA) was used to "drive" hybridization between existing strands of non-Y DNA in a sample of male genomic DNA, by physically separating the hybridized non-Y DNA from the single stranded Y DNA with hydroxyapatite chromatography. The non-Y DNA was subtracted from the sample and Y-chromosome specific DNA was generated. Subtractive hybridization of cDNA with mRNA, or of cDNA libraries, has also been used to identify mRNAs responsible for certain developmental events (10). Subtractive hybridizations were also reported by Kunkel et al.(2) and Nussbaum et al.(3).
However, the methods described above employ physical separation techniques, which are designed to isolate sequences with a particular affinity to a substrate. For example, hydroxyapatite gel chromatography is used to separate double stranded nucleic acids from single stranded nucleic acids. As used in Lamar and Palmer (1), the double stranded nucleic acids are "subtracted" from the single stranded nucleic acids to produce a sample enriched in the unique single stranded nucleic acids. Thus, the physical separation techniques can enrich the population of particular nucleic acids believed to be present in a sample. The ability to increase the relative abundance of particular nucleic acids with respect to all other nucleic acids present in the library is, however, limited by the effectiveness of the physical separation technique.
In addition, subtraction hybridization is a technically difficult, time-consuming, and often either an impractical or unreliable procedure. The subtraction generally involves a physical separation method as noted above, such as hydroxyapatite chromatography (1) or streptavidin binding to biotin-labeled sequences. The efficiency of the physical separation process, both the binding and release, necessarily controls the degree of enrichment for a particular, desired sequence. Thus, while the subtraction hybridization methods yield the desired products (target nucleic acid) in certain applications, the limited degree of enrichment for unique and/or rare species precludes a universal or even general application. Furthermore, the technical demands of the physical separation step common to the substraction techniques created a need for improved methods to identify rare nucleic acids.
The introduction and use of the polymerase chain reaction (PCR) in library construction techniques attempted to address the problem of isolating a rare nucleic acid from a particular source. The amplification of nucleic acids in PCR theoretically results in an exponential increase of all the sequences present that are appropriately primed. Thus, even rare sequences would be present in a much higher absolute number following PCR. However, if all sequences in a sample are amplified, the relative abundance of a particular, rare sequence compared to the number of other sequences present does not change as a result of the PCR amplification. In order to specifically amplify a particular sequence, a primer specific for that sequence is required.
One effort to enrich for particular sequences or desired subsets of sequences is PCR-based subtractive cDNA cloning (15). However, this technique, which subjects the products of two separate and differing PCR amplifications (amplicons) to subtraction hybridization, employs the same physical separation step mentioned above. More specifically, a first amplified sample contains the desired or target nucleic acid (tester) and a second sample amplified does not (driver). During amplification of the driver, biotin-labeled nucleotides are incorporated into the reaction products (amplicons). After combining the products of the two amplifications and annealing complementary nucleic acids, the undesired hybrids which are formed are subtracted from the sample by a process involving the binding of streptavidin to the biotin-labeled nucleotides, followed by chemical extraction. Streptavidin will only bind to the hybrids containing driver nucleic acids, which contain biotin-labeled nucleotides, so hybrids of two tester nucleic acids (tester:tester) will not be subtracted. The degree of subtraction or enrichment will depend therefore on the efficiency of the extraction of the hybrids of driver nucleic acids.
Lisitsyn et al. introduced an adaptation of the PCR-based method when they described representational difference analysis (RDA) for genomic DNA (4). RDA utilizes PCR to enrich for unique species in one of the samples after hybridization and polymerization steps. RDA does not rely on physical separation methods. Instead, RDA uses two separate ligations of two different adaptors to enrich for unique species. After an initial PCR amplification of both tester and driver samples with a first adaptor, a second adaptor is attached to the ends of tester DNA but not the driver DNA. Then, after mixing the second adaptor-treated tester DNA with driver DNA, denaturing, hybridizing, and filling in overhanging ends, only double stranded tester DNA should amplify exponentially with PCR primers specific for the second adaptor sequences. In theory, the tester:driver hybrids should amplify linearly and the driver:driver hybrids should not amplify at all.
In order to be effective, RDA requires a reduced complexity in the starting material used (4, 5). To reduce the complexity, RDA generally employs a digestion of total genomic DNA with a six base pair-cutting enzyme and amplifying the digested DNA by PCR. A high proportion of the digested fragments do not fall within what Lisitsyn et al. defined as the amplifiable range of 150-1000 base pairs. Larger fragments are not amplified, reducing the complexity of the amplicon so that the small representation contains only about 2-10% of the total genome (12, 13). Of course, the representations of the PCR will not encompass the entire sequence information available in the genome. Consequently, desired sequences may not be represented in the subtracted library while undesired species may be represented in the subtracted library.
RDA has been applied to cDNA subtraction by Hubank and Schatz (5). The method is very similar to RDA described by Lisitsyn et al., with cDNA being used as the starting material instead of genomic DNA. As with RDA, there are two adaptor ligation steps. The method is designed so that only tester:tester hybrids contain the PCR primer binding sites on both ends of the strands of DNA, and thus are the only species that are exponentially amplified. In contrast to the complexity of genomic RDA, a population of cDNA derives from some 15,000 different genes in a typical cell and represents only about 1-2% of the total genome (14). Therefore, RDA can apparently be applied to cDNA without the need to first reduce the complexity.
Hou et al. describe a recent attempt to address the problems of complexity in genomic PCR-based methods and, in addition, include the entire genome in the method (6). This method involves identifying deleted sequences in a particular genome. It is an abbreviated version of RDA, in which certain steps of RDA are omitted such as the initial preparation of driver representations or amplicon and the single-stranded nuclease step (6). Instead, Hou et al. sonicate genomic DNA to produce driver DNA. This method, in its current form, is likely to be useful for techniques using genomic DNA and not cDNA. Producing large or sufficient amounts of driver DNA will require an initial amplification step, in most instances. Moreover since the introns contained in genomic driver can cause problems during the priming step of RDA, it is unclear how successful the method will be when applied in general, as opposed to the identification of deleted sequences by Hou et al.
In addition to the above drawbacks, RDA only selects for the most abundant target sequences in the tester or tracer population. This phenomena results from the "kinetic enrichment" phenomenon associated with the procedure (4). Kinetic enrichment involves a hybridization step that is too short to allow a relatively rare tester:tester hybrid to form. The unhybridized rare nucleic acids will then amplify only linearly or will be digested by a single stranded nuclease step in the RDA procedure. Linear amplification occurs when there is only one strand to act as a template for amplification. Exponential amplification involves the amplification of two complementary strands. Abundant nucleic acids within the population will form tester:tester hybrids in a shorter period of time and at a higher frequency than rare nucleic acids. Consequently, the more abundant nucleic acids will have a higher probability of subsequent exponential amplification than the rare nucleic acids. The linearly amplified rare nucleic acids, often the desired target, will effectively become lost from the amplified population.
Another drawback of RDA is the importance of an appropriate concentration ratio of driver to tester nucleic acid. For example, even if a desired target is not lost from the population due to kinetic enrichment, it is amplified exponentially along with all of the other tester:tester hybrids in the population. The other tester nucleic acids must somehow be removed in order to identify the desired target. Undesired nucleic acids in the tester population (i.e., non-target nucleic acids) are removed with driver only in linear proportion to the concentration of driver nucleic acid used in the subtractive hybridization. That is, if driver is present in 100 fold excess, then 1/100 of the non-target nucleic acids in the tester population escape hybridization with driver. These non-target nucleic acids then amplify exponentially along with the target nucleic acids in the tester population. Thus, enrichment for the desired target is limited by the use of an appropriate driver:tester nucleic acid ratio. Multiple rounds of hybridization and subtraction are generally needed to effect desired enrichments. As noted in prior discussions (15), 5-20 repeated subtractions are sometimes required and RDA procedures also require repeated subtractions. The repeated subtractions are obviously quite cumbersome and time-consuming.
Yet another drawback of RDA comes from the linear amplification of undesired nucleic acids, such as tester:driver hybrids, and the concomitant reduction in amplification of desired target nucleic acids. If a particular, undesired nucleic acid in both the tester and driver samples is in relative abundance, it is amplified linearly during RDA. During this linear amplification, it is competing for primers, enzyme, and nucleotides with the other nucleic acids present. This is especially problematic very early on in the amplification process, when such undesired nucleic acids are in great abundance relative to the target. This can limit the amplification efficiency of the desired nucleic acids. Moreover, linear amplification of the undesired tester:driver hybrids results in a concentration of such hybrids which are higher than desired following the PCR process.
Thus, RDA, while very powerful, still has certain drawbacks. There are limitations in the applicability of its use, such as in the complexity of the samples permitted. Also, RDA requires two separate ligation procedures with two different adaptors. And, RDA is most effective when the desired nucleic acid is relatively abundant in the sample. Clearly, alternative methods for generating enriched nucleic acid samples are needed.
Suzuki et al. have attempted to address some of the drawbacks of RDA with a method referred to as ESD, (Equalization of cDNAs, Subtractive hybridization, and Differential display (7)). The method attempts to equalize or normalize the content of the tester and driver samples by performing an initial subtraction with the target-containing tester cDNA. A physical subtraction hybridization step is relied on, with tester cDNA acting as "driver," to effectively equalize the contents of each of the cDNA populations. PCR is performed subsequent to the equalization. This, in theory, helps to ensure the exponential amplification of nucleic acids that were rare in the starting cDNA population and reduce the relative abundance of common, undesired nucleic acids by avoiding the kinetic enrichment problem.
While apparently advantageous, ESD is primarily a physical subtraction method. Both the above-mentioned hydroxyapatite gel chromatography and biotin-streptavidin procedures were used. The mere reliance upon the physical subtraction steps makes ESD technically challenging and introduces the drawbacks indicated above. In addition, PCR is used only to regenerate the non-subtracted population. The exponential enrichment possibilities of PCR or any amplification reaction does not itself play a role in increasing the relative abundance of desired nucleic acid during the ESD procedure.
While other PCR-based techniques have been employed for enriching desired nucleic acids as in the methods for generating subtractive libraries, each of these methods also has its drawbacks. One method, the "chemical cross-linking subtraction" method (32), specifically requires a mRNA-cDNA hybrid in order to subtract nucleic acids. This requirement necessarily limits the method's application to situations where both a mRNA and a cDNA sample are available for use. Another method, discussed in Riley et al. (33), employs a "vectorette" adaptor in PCR. However, the method requires partial sequence information, which is not always known. Thus, the method of Riley et al. is limited to situations where partial sequence information is known. One final method, discussed in Chenchnik et al. (34), involves a "pan-like" hybridization structure that is used to prevent certain nucleic acids from being amplified. In effect, the method relies on the efficiency of one type of hybridization over another in order to selectively suppress amplification of certain nucleic acids. Additionally, the method selects for reannealed tester hybrids and, as previously discussed, may therefore select for tester of higher abundance and select against rare testers.
Thus, there remains a need in the art for new and improved methods to generate specific populations of nucleic acids as used, for example, in producing subtracted libraries. By providing methods for preferentially replicating or amplifying nucleic acids, the disclosed invention fulfills those needs. The invention represents a significant advancement over previous methods because, inter alia, physical separation techniques are not required, only user friendly laboratory procedures are used, and the preferential replication and amplification of desired nucleic acids is simplified and more efficient.