The present invention provides a novel method for preparing cDNA libraries containing enhanced percentages of full-length cDNA inserts.
Technology aimed at the production of cDNA libraries, which are important tools in the discovery of biologically relevant genetic sequences, often produces cDNA libraries that are far from perfect. cDNA libraries may contain a high percentage of molecules where the cDNA insert within the library vector is not full-length as compared to the naturally-occurring mRNA molecule from which the cDNA was derived. cDNA libraries, even those designed to be xe2x80x9cdirectionalxe2x80x9d or having the cDNA insert present in a particular 5xe2x80x2xe2x86x92 greater than 3xe2x80x2 orientation relative to the vector sequences, often contain a high percentage of xe2x80x9cflippedxe2x80x9d inserts where the cDNA insert is oriented in the opposite orientation from that which is most desirable for characterization and expression of the cDNA insert. In addition, some cDNA libraries demonstrate a high incidence of multiple inserts, where unrelated cDNA molecules are aberrantly ligated into the same vector molecule.
There exists a need for novel methods of cDNA library production, and it is to such methods that the present invention is directed.
Construction of high quality cDNA libraries, with greater than 90% of the inserts being the full-length copy of the corresponding mRNA molecules, is crucial to the success of our effort to clone all the human genes encoding secreted proteins. Several factors contribute to the poor quality of cDNA libraries constructed using the conventional method, i.e., cDNA synthesis followed by ligation into plasmid or phage vectors. First, mRNA molecules may be degraded during RNA isolation and in the process of first strand cDNA synthesis. In addition, most mRNA samples are isolated from total cellular RNA using the oligo-dT capture protocol and, therefore, contaminated with partially-precessed poly(A) containing precursor RNA and partially degrated 3xe2x80x2 portion of mRNA molecules. Second, during first-strand cDNA synthesis, reverse transcriptase tends to prematurely fall off the RNA templates due to RNA secondary structures or insufficient processivity of the enzyme itself. Third, the ligation step after ds cDNA synthesis may result in the following undesirable artifacts: A). Multiple cDNA inserts are ligated into the same vector due to the high insert/vector ratio used to increase the population of clones containing a cDNA insert. B). There is a high percentage (about 10%) of flipped cDNA insert when a unidirectional library is constructed. C). Contaminating DNA can be incorporated into the library. For example, some of the early libraries constructed by Clontech were contaminated by yeast chromosome DNA when yeast tRNA was used to precipitated the cDNA. Another example is that when the full-length cDNA was selected (Carninci, et al., 1996), ligation of contaminating partial cDNA into the vector compromised the quality of library. D). There is a selection for smaller cDNA inserts since they are ligated more efficiently than larger ones.
Numerous efforts have been taken to increase the cloning efficiency from a definite amount of mRNA and/or to increase the proportion of the full-length inserts. Some of the most successful approaches include: A). An engineered reverse transcriptase was designed by GIBCO-BRL to inactivate its Rnase H activity, which causes on-template RNA cleavage and premature termination of transcription when the enzyme stutters before a secondary structure. Thus far, the Superscript II reverse transcriptase (BRL) remains the most popular enzyme for first-strand cDNA synthesis. B). Oligo-dT tailed vectors were used for first-strand cDNA synthesis (Okayama and Berg, 1982); Alexander et al., 1984; Bellemare et al., 1991; Kato et al., 1994). This method dramatically increased the cloning efficiency and the proportion of insert-containing clones. C). Strategies for specific capture (Edery et al., 1995) or labeling of the 5xe2x80x2-end cap of mRNA molecules with oligonucleotides (Fromont-Racine et al., 1993; Liu and Gorovsky, 1993; Maruyama and Sugano, 1994; Kato et al., 1994) or biotin (Carninci et al., 1996, 1997) were used to select for full-length cDNA. Libraries constructed with a selection for the 5xe2x80x2-end cap such as the Kato strategy (Kato et al., 1994, the Protagene protocol) and the biotin capture method (Carcinci et al., 1996) have a high percentage of full-length cDNA inserts ranging from 70% to 95%. However, none of the above mentioned strategies could completely satisfy the requirements for high efficiency, high proportion of full-length cDNA inserts and low contaminating or aberrant DNA inserts due to DNA ligation.