In many biological systems, changes in expression of certain genes can, and do, lead to dramatic changes. Phenotypically, these changes may include developmental fate determination, cell death, and oncogenesis. Hence, identification of these genes is clearly of great interest; however, such identification is often difficult, because the level of expression may be very low, or changes in expression may be very subtle. For example, certain mRNA molecules representative of transcription factors may be present as a single copy per 10,000 mRNA molecules.
The issues and interest touched upon supra have led to development of many methodologies. While these methods all have certain benefits, they also exhibit drawbacks which limit their efficacy and applicability. One major drawback of all of the methods is that they do not offer convenient procedures for identification of low abundance molecules, which differ by 10-fold or less relative to controls.
The first class of prior art methodologies are the "non-selective" systems. Differential screening is one such approach. It involves screening a library, in duplicate, using labelled cDNA from two different RNA populations. Relative signal intensity of plaques or colonies after probe hybridization, theoretically, represents the abundance of cDNA in the probe population, and, hence, differences in signal should represent clones which are differentially expressed.
The problem with this system is that it is functional only with high abundance RNA. Low abundance species are underrepresented in both library and probe populations. As a result, there are problems with screening large numbers of plaques or colonies and one is forced to use probes of extremely high specific activity, with long incubation times, in order to secure a significant signal.
A second method of this type is the differential display system, described by, e.g., Liang, et al., Science 257:967 (1992). In this system, random PCR products are generated for display on polyacrylamide gels. A simple visual comparison of patterns between gels should permit identification of species which differ in abundance. The problem with differential display is that it is extremely labor-intensive, due to the number of primer pairs and sequencing gel runs needed to cover statistically significant portions of the population. Further, no selection is used against relatively abundant, non-differentially expressed species, and these species may obscure the detection of less abundant species of interest.
In contrast to the methods discussed supra, so-called selective methods afford some ability to separate out molecules of interest. In subtractive hybridization, single stranded cDNA or RNA from a population of interest (the "tester" population), is hybridized with an excess, generally 100- to 1,000-fold, of complementary, single-stranded cDNA or RNA from a control population (the "driver" population). Double-stranded hybrids, which represent species shared between driver and tester, are "subtracted" from the mixed population. Generally, this is accomplished by hydroxyapatite columns chromatography or by tagging cDNA with biotin, followed by removal of biotin containing complexes with streptavidin. Any remaining, unhybridized molecules are then used for subsequent analyses. See Milner, et al. , Nucleic Acid Res. 23:176 (1995), for a review of this technology.
This methodology, however, is of limited application. First, species in the tester population whose difference in abundance from the driver population is less than the excess of driver-to-tester in mixed population, will be lost prior to subtractions. Second, "subtraction" of hybrid molecules by column chromatography/biotin extraction is a "negative" purification, relying on removal of unwanted molecules, rather than the desired forms. Any unwanted molecules which are not removed will interfere with subsequent experimentation. Further, as the kinetics of hybridization approach but do not reach completion, any unwanted molecules which have not hybridized will contaminate the tester population. When the desired species are low in abundance, even low absolute amounts of contamination may obscure detection.
Competitive hybridization, in contrast to subtractive hybridization, uses competition between two, denatured populations of cDNA. Either hetero-hybrids of driver/tester strands, or homo-hybrids result. To carry out these assays, two double-stranded populations are mixed, denatured complexity, and allowed to re-associate. Random assortment presumes re-association with a probability based upon relative abundance. Hence, if a cDNA species is present, at a higher concentration in the tester population than the driver population, tester homo-hybrids make up a greater proportion of the hybrids than homo-hybrids of a cDNA species present in equal quantities in both populations.
Any tester homo-hybrids are removed, and represent the selected population. This differential enrichment is the basis for subsequent enrichment by competitive hybridization.
The main problem with this methodology is that it relies on efficient recovery of tester homo-hybrids for effective enrichment. If the driver:tester ratio is high, the fraction of molecules found as tester homo-hybrids is very low, and purification steps must be very precise, and/or repeated frequently.
Competitive hybridization methods have found wide use. One method, disclosed by Wang, et al., Proc. Natl. Acad. Sci. USA 88:11505 (1991), utilized biotinylated driver cDNA, with separation by streptavidin binding and organic extraction. Zeng, et al. Nucleic Acids Res. 22:7381 (1994) disclose a method where tester molecules are tagged with thiolated nucleotides. As homo-hybrids, these molecules resist digestion by exonucleases III and VII. Yet a further method, that of Klickstein, in Ausubel, et al., ed. Current Protocols In Molecular Biology (Wiley & Sons, N.Y., 1995), pages 5.8.9 to 5.8.15, uses compatible restriction site overhang sequences which are present only on tester molecules, thereby permitting cloning of only tester homo-hybrids. Lisitsyn, et al., Science 259:946 (1993), teach a method where PCR primers are ligated to tester molecules, such that only tester homo-hybrids are exponentially amplified.
The method of Wang and Zeng are not satisfactory, because both are negative enrichments. Even reaction efficiencies of greater than 99% may not be sufficient to prevent contamination by unwanted species. Klickstein and Lisitsyn are positive selection methods but are labor-intensive.
Competitive hybridization suffers from two additional problems. The first, the so-called "Cot problem", stems from the fact that only double-stranded tester homo-hybrids are selected after hybridization. As a result, if hybridization does not proceed beyond a Cot.sub.1/2 value sufficient for low abundance species, the fraction, or a large portion of it, is lost during selection. Further, there is the problem of "preferential amplification", discussed infra.
Hence, there is clearly a need for an improved method for identifying and/or quantifying nucleic acid molecules in a sample, especially those mRNA molecules which are present in low abundance (about 100 molecules or less per cell). The particulars of the inventive methods, which address these problems, are set forth below.