Methods of preparing cDNA libraries have been disclosed and are well known in prior art. For example, they are described by Ederly I., et al., 1995, Mol Cell Biol, 15:3363–3371; Kato S., et al., 1994, Gene, 150:243–250; and K. Maruyama et al., 1995, Gene, 138:171–174.
In prior art, Carninci et al., 1996, Genomics 37:327–336; Carninci et al., 1997, DNA Research, 4:61–66; and Carninci and Hayashizaki, 1999, Methods Enzymol, 303: 1–44, describe efficient methods for the preparation of cDNAs. These methods, comprising a modified “tagged cap trapper” to select long-strand, full-coding and/or full-length cDNA libraries after tagging of the cap structure, allow the preparation of long, full-coding and/or full-length cDNA libraries containing all of a particular coding sequence and its 3′ and 5′ untranslated regions (UTRs). Such libraries are particularly useful for large-scale sequencing projects in which the recovery of long, full-coding and/or full-length (full-coding/length) clones is required from among truncated clones (EST sequences).
However, the preparation of long, full-coding/length cDNA libraries entails certain problems. The preparation of long or full-coding/length cDNA is more efficient for short-strand mRNAs than for long-strand mRNAs (transcripts). In addition, cloning and amplification is more difficult for long-strand cDNAs than for short-strand cDNAs, further introducing size bias. Using truncated cDNAs to recover full-length cognates is impractical at the genomic-scale level; however, cDNAs in a standard library can be cloned in either their long, full-coding/length or truncated form, thus favoring discovery of at least one EST for any gene, regardless of length.
Another problem relates to the nature of cellular mRNA. mRNA can be classified into superprevalent (or abundant), intermediate and rare mRNA based on expression. In a typical cell, 5–10 species of superprevalent mRNA comprise at least 20 percent of the amount of mRNA, 500 to 2,000 species of intermediately expressed mRNA comprise 40 to 60 percent of the amount of mRNA, and 10,000 to 20,000 rare species of mRNA comprise less than 20 to 40 percent of the amount of mRNA. This average distribution may vary markedly between tissue sources, and the presence of numerous highly expressed genes may further alter this distribution. Sequencing cDNA from standard cDNA libraries is ineffective to discover rarely expressed genes because intermediately and highly expressed cDNA ends up being excessively sequenced.
Once most mRNAs of the superprelevant and intermediate frequency classes have been identified, redundancy levels are expected to exceed 60 percent. Thus, the use of a hybridization normalization method has been proposed to solve this problem. The principle behind normalization is to decrease the frequency of the most abundant clones while increasing the frequency of less prevalent cDNAs. Several methods of normalization for the preparation of EST cDNAs are introduced by Soares et al., 1994, Proc. Natl. Acad. Sci. 91:9228–9232, who has disclosed a normalization method for preparing EST sequences. This method is based on the reassociation of nucleic acids cloned in amplified plasmid libraries. However, amplified plasmid libraries subjected to normalization are not useful for the preparation of long-strand, full-coding/length cDNAs. This is because there is a cloning bias associated with plasmid libraries where short-strand cDNAs are efficiently cloned with cloning efficiency decreasing with the length of the strand. In fact, in Soares et al., 1994, DNA must be cloned into a plasmid and then be converted to tester single-strand DNA. The ligation to plasmids reduces the strand length of the cDNA that is recovered (that is, long strands of cDNA tend to be lost).
Additionally, during library amplification prior to normalization, the ease with which cDNA clones are grown varies with plasmid length. Thus, long-strand, full-coding/length clones tend to be underrepresented following bulk amplification of the library. In amplified plasmid libraries, the recovery of full-coding/length clones becomes even more difficult.
Other literature, such as Tanaka et al., 1996, Genomics, 35:231–235, discloses methods for the preparation of EST sequences in which mRNA is first annealed on oligo-dT conjugated on a solid matrix. This method is not suitable for preparing normalized long-strand, full-coding/length cDNAs because of mRNA degradation before cDNA synthesis. Further, the hybridization rate of nucleic acids immobilized on a solid phase is slower than that in solution hybridization.
Libraries created with PCR- and solid matrix-based normalization technologies known in the art exhibit sequence redundancy similar to that of non-normalized cDNA libraries used in EST projects.
An additional problem consists in that in the preparation cDNA libraries or encyclopedias (for example, a mammal full-length cDNA encyclopedia) with the aim of collecting at least one long-strand, full-coding/length cDNA for each gene expressed irrespective of the tissue source, not only is it desirable to remove cDNAs that are redundant within the library, but also cDNAs that have already appeared in previous libraries, so as to accelerate the discovery of new long-strand, full-coding/length cDNAs.
To solve this problem, hybridization subtraction methods have been proposed.
Sagerstrom et al., Annu. Rev. Biochem., 1997, 66:751–83, gives an overview of the subtraction methods known in the art. The basic idea of subtraction is that the nucleic acid from which one wants to isolate differentially expressed sequences (the tracer or tester) is hybridized to complementary nucleic acid that is believed to lack sequences of interest (drivers) and in which the drivers are present in much higher concentration than thetesters. The tester and driver nucleic acid populations are allowed to hybridize, with only sequences common to the two populations forming hybrids. After hybridization, driver-tester hybrids and unhybridized drivers are removed, and the remaining nucleic acids can be used to prepare a library rich in tester-specific clones or to make probes that can be used to screen a library for tester-specific clones.
However, subtraction methods also entail the same problems described for normalization with PCR- and solid matrix-based technologies. They are suited to the preparation of EST sequences, but cannot be used to prepare long-strand, full-coding/length cDNAs.
Bonaldo et al., 1996, Genome Reseach, 6:791–806, discloses a subtractive hybridization approach specifically applied to reducing the expression of pools of already sequenced clones from normalized libraries yet to be surveyed.
This normalization and subtraction technique (Bonaldo et al. 1996) is useful for large-scale gene discovery in EST research, but has the drawbacks already indicated in prior art (cDNA cloned in amplified plasmid as disclosed by Soares et al., 1994) and is not suited to long-strand and full-coding/length cDNA inserts.
Inparticular, as stated above, during library amplification prior to the normalization and subtraction steps, the amplification of cDNA clones varies with plasmid length, with long clones being underrepresented following bulk amplification of the library. That is, the relative expression of long-strand cDNA clones decreases, rendering such cloning difficult.
A further problem of the normalization and subtraction method disclosed by Bonaldo et al. is that both the normalization and subtraction steps require incubation and an incubation period causing the breakup of plasmids—bigger plasmids (containing long-strand cDNAs) in particular. As a consequence of the normalization and subtraction steps, the number of resulting long clones is very limited ornull.
This also confirms the unsuitability of this method to the preparation of normalized and subtracted long-strand, full-cloning/length cDNAs.
A still further problem relating to normalization and/or subtraction methods is that non-specifically-bound tester/driver hybrids form in these steps due to complementary binding of imperfect sequences. The removal of such hybrids would result in the elimination from the tester of targeted cDNAs erroneously considered to be abundant and/or to have already been sequenced in other libraries, but which in reality are not abundant and have not been previously sequenced.
Accordingly, the purpose of the present invention is to solve the several problems of prior art and to provide an efficient method for the preparation of normalized and/or subtracted long-strand and full-coding/length cDNA libraries.