The present invention relates to the construction of cDNA libraries, and in particular to methods of constructing full-length cDNA libraries.
Methods for constructing complementary DNA (cDNA) libraries from mRNA are well known in the art. In a typical procedure, poly(A)+ mRNAs are isolated from cells, preferably a cell type in which the mRNA encoding the desired polypeptide is produced in large quantities. The mRNAs are then converted into cDNA in vitro using the enzyme reverse transcriptase to synthesize complementary cDNA strands from the mRNA template. General protocols are, for example, described in Chapter 5 of Ausubel et al., Current Protocols in Molecular Biology, Volume 1 (1991). Two commonly used methods of producing cDNA from mRNA are described in Okayama and Berg, Mol. Cell Biol. 2:161-170 (1982) and Gubler and Hoffman, Gene 25:263-269 (1983).
In the conventional process of converting mRNA into double stranded cDNA in vitro, a first cDNA strand is synthesized by the reverse transcriptase and separated from the mRNA by treatment with alkali or using a nuclease such as the enzyme RNase H. E. coli DNA polymerase then uses the first cDNA strand as a template for the synthesis of the second cDNA strand, thereby producing a population of double stranded cDNA molecules from the original poly(A)+ mRNA. After converting the 5xe2x80x2 and 3xe2x80x2 ends into blunt ends, the cDNA can be ligated to linkers/adaptors and subsequently ligated into suitable vectors and transformed or packaged into a cell line to form the library. The library can then be screened for cells transformed with nucleic acid encoding the desired polypeptide.
While the conventional methods have been used to successfully create cDNA libraries, and to identify a large number of polypeptides, they do have certain disadvantages. For example, an intrinsic problem in the construction of high quality full-length cDNA libraries is that, under in vitro conditions, the reverse transcriptase very often does not extend the first strand cDNA up to the 5xe2x80x2 end of the mRNA, with the result that some mRNA sequences (often longer sequences) are not represented in the library. This is thought to occur in part due to misincorporation of an incorrect base by the reverse transcriptase, which destabilizes the cDNA/mRNA duplex. Enzymes or proteins present in the cell that normally repair nicks or correct mistakes during DNA synthesis are not present when the cDNA is synthesized in vitro.
In addition, hairpin formation in the mRNA can lead to early termination in the conversion to cDNA. This is especially a problem in the cloning of polypeptides having a signal sequence located at the 5xe2x80x2 end of the gene, as these libraries are often screened by detecting polypeptide exported from the transformed cells. Thus, these methods require full-length cDNA, including the signal sequence.
A number of methods have been developed to attempt to address these problems. For example, a different method of synthesizing cDNA in vitro selects full length poly(A)+ mRNA by treatment with bacterial alkaline phosphatase and tobacco acid pyrophosphatase, and subsequently ligating the 5xe2x80x2 end of the mRNA to a chimeric DNA-RNA linker containing a restriction site. See, Kato et al., Gene 25:243-250 (1994). The poly(A) 3xe2x80x2 end of the mRNA is then hybridized to an oligo d(T) sequence of and the oligo d(T) used to prime cDNA synthesis. This procedure is also limited, however, by the efficiency of the phosphatases and the ligation procedure. Moreover, the ligation procedure can work with mRNA in which the 5xe2x80x2 end has degraded, since the method does not distinguish between full-length and partial mRNA.
In standard methods currently used for the preparation of cDNA libraries, the mRNA in the cell is isolated by virtue of the presence of a polyadenylated tail present at its 3xe2x80x2 end, which binds to a resin specific for this structure (oligo dT-chromatography). The purified mRNA is then copied into cDNA using a reverse transcriptase, which starts at the 3xe2x80x2 end of the mRNA and proceeds towards the 5xe2x80x2 end. Second strand synthesis is then performed. Linkers are added to the ends of the double stranded cDNA to allow for its packaging into virus or cloning into plasmids. At this stage, the cDNA is in a form that can be propagated.
One disadvantage observed with current cDNA library synthesis protocols is that current methods tend to produce libraries having a significant proportion of incomplete cDNAs, which results from inefficiencies in the reverse transcriptase employed to generate the library. To compensate for the incomplete cDNA constituents of the library, investigators must perform many rounds of isolation (screenings) and construct a xe2x80x9cfull-lengthxe2x80x9d cDNA from the accumulated pieces. Such processes are resource intensive and do not ensure that each initial mRNA is represented in the cDNA library.
In addition, there is significant under-representation of sequences close to the 5xe2x80x2 end of mRNAs in cDNA libraries produced by conventional methods. This under-representation results from the fact that the reverse transcriptase will usually xe2x80x9cfall offxe2x80x9d before reaching these sequences. In many instances, the information located at the 5xe2x80x2 end is of great interest.
Thus, there remains a need in the art for improved cDNA libraries, and in particular for cDNA libraries that are enriched for full-length cDNAs.
The present invention provides methods for identifying cDNAs comprising sequences corresponding to the 5xe2x80x2 end of a transcript, methods of producing libraries comprising cDNAs, methods of verifying the presence of a 5xe2x80x2-end in a cDNA, and producing libraries particularly rich in 5xe2x80x2 ends and full length cDNAs (40-70%) relative to libraries produced from conventional technologies (10-30%).
In one embodiment, the present invention provides an improved method for producing full-length cDNAs comprising 1) relaxing the mRNA secondary structure, e.g., by adding an agent such as dimethyl sulfoxide (DMSO) to the first-strand synthesis reaction mixture and 2) utilizing a thermostable enzyme that exhibits 3xe2x80x2 to 5xe2x80x2 exonuclease activity for template driven enzymatic deoxynucleotide synthesis during first strand synthesis.
In one embodiment, the invention provides a method of isolating a full-length cDNA by: 1) contacting a ribonucleic acid molecule with a primer under conditions sufficient for template driven enzymatic deoxyribonucleic acid synthesis to occur to form a hybrid RNA:DNA molecule; 2) contacting the hybrid molecule with a detectably labeled oligo-dV primer under conditions sufficient for template driven enzymatic deoxyribonucleic acid synthesis to occur to produce hybrid primer:RNA sequences, provided the deoxyribonucleic acid of the first hybrid molecule does not extend to the 5xe2x80x2 end of the ribonucleic acid; 3) isolating a hybrid molecule that does not contain label; and 4) converting the unlabelled hybrid molecule to a first double-stranded deoxyribonucleic acid molecule. In a particular embodiment, the single-stranded ribonucleic acid molecule is an mRNA. Following cDNA production, the double-stranded cDNA molecule can be introduced into a vector.
The primer used for first strand cDNA synthesis may be any primer that allows for directed synthesis of a deoxyribonucleic acid from a ribonucleic acid (including a gene specific primer and/or a random primer), but is preferably an oligo-dT primer. The detectably labeled oligo-dV primer can be labeled with anything known in the art, including but not limited to, biotin, digoxygenin, radioactivity, and the like.
The present invention also features a method of identifying a full-length first strand cDNA including the steps of contacting an RNA-cDNA hybrid with a detectably labeled primer under conditions sufficient for template driven enzymatic deoxyribonucleic acid synthesis to occur, where the primer is composed of a plurality of deoxyadenosines, deoxycytidines, and/or deoxyguanosines. The primer mixture used in the reaction may contain a plurality of the same primers, or a mixture of primers having varying sequences. The primer will result in primer:RNA hybrid sequences if the cDNA in the first hybrid molecule does not extend to the 5xe2x80x2 end of the RNA, and the primers can be detected as labeled sequences in the RNA:cDNA hybrid. Labeled sequences in a hybrid is indicative of the hybrid having a non-full-length cDNA. This can be performed as a step in cDNA synthesis, or to verify the efficacy of a particular method and/or reagent (e.g., an enzyme).
Another embodiment provides a method for producing a 5xe2x80x2 enriched cDNA library from a sample of mRNA molecules by: 1) contacting said mRNA molecules with a first primer under conditions sufficient for template driven enzymatic deoxyribonucleic acid synthesis to occur, creating a population of hybrid molecules of mRNA molecules hybridized cDNA molecules; 2) isolating the population of hybrid molecules; 3) contacting the isolated hybrid molecules with a detectably oligo-dV labeled primer under conditions sufficient for template driven enzymatic deoxyribonucleic acid synthesis to occur, so that the reaction results in additional production of hybrid sequences provided the cDNA in the first hybrid molecule does not extend to the 5xe2x80x2 end of the ribonucleic acid; 4) isolating hybrid molecules that do not contain labeled primer sequences; and 5) converting the unlabeled hybrid molecules to a first double-stranded deoxyribonucleic acid molecule. These molecules can then be separated and introduced into vectors.
A feature of the present invention is a method for increasing the production of full-length cDNAs.
Another feature of the present invention is to provide methods for identifying cDNAs that comprise the 5xe2x80x2 end of an RNA, and in particular the 5xe2x80x2 end of an mRNA.
An advantage of the present invention is that it provides quick and effective methods for validating the presence of a 5xe2x80x2 end on a cDNA.
Another advantage of the present invention is that it provides for improved full-length cDNA libraries.
These and other objects, advantages, and features of the invention will become apparent to those persons skilled in the art upon reading the details of the invention as more fully described below.
Before describing the present methods, constructs and reagents, it is to be understood that this invention is not limited to the particular methods, constructs and reagents described, and as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Each smaller range between any stated value and intervening value in a stated range and any other stated or intervening value in that stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range, and each range where either, neither or both limits are included in the smaller ranges is also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, the preferred methods and materials are now described. All publications mentioned herein are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited.
It must be noted that as used herein and in the appended claims, the singular forms xe2x80x9ca,xe2x80x9d xe2x80x9cand,xe2x80x9d and xe2x80x9cthexe2x80x9d include plural references unless the context clearly dictates otherwise. Thus, for example, reference to xe2x80x9can enzymexe2x80x9d includes a plurality of such enzymes and reference to xe2x80x9cthe cDNAxe2x80x9d includes reference to one or more cDNAs and equivalents thereof known to those skilled in the art, and so forth.
The publications discussed herein are provided solely for their disclosure prior to the filing date of the present application. Nothing herein is to be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided might be different from the actual publication dates, which may need to be independently confirmed.
The term xe2x80x9cnucleic acidxe2x80x9d as used herein refers to any polynucleotide, and is intended to encompass ribonucleic acids (xe2x80x9cRNAxe2x80x9d), including mRNA, and deoxyribonucleic acids (xe2x80x9cDNAxe2x80x9d), including genomic and cDNA. The term is also intended to encompass RNA or DNA having analogs or substitutions to the structure of the nucleic acid, provided the analogs or substitutions does not impede the ability to isolate and/or characterize the sequence of the desired region of the nucleic acid.
The term xe2x80x9cprimerxe2x80x9d as used herein refers to a polymer of nucleotides capable of acting as a point of initiation of DNA synthesis when annealed to a nucleic acid template under conditions in which synthesis of a primer extension product is initiated, i.e., in the presence of four different nucleotide triphosphates and a polymerase in an appropriate buffer (xe2x80x9cbufferxe2x80x9d includes pH, ionic strength, cofactors, etc.) and at a suitable temperature. Generally, a primer will be between 12 and 100 nucleotides, more preferably between 15 and 80 nucleotides, and even more preferably between 18 and 50 nucleotides. The primer may be composed of naturally occurring and/or modified nucleotides, and the modified nucleotides may have a base substitution (e.g., an analog with improved binding), a modified internucleoside linkage, or a substitution of the ribose group.
A primer that hybridizes to a sequence, refers to a primer that is complementary to a strand of the nucleic acid and/or a strand of the adaptor. A primer that hybridizes to the coding region of a nucleic acid, or the corresponding strand of the adaptor, will have an xe2x80x9cantisensexe2x80x9d sequence, i.e., the primer will form Watson-Crick base pairing with the coding region. A primer that hybridizes to a sequence complementary to a sequence will have a xe2x80x9csensexe2x80x9d sequence, i.e., it will have the same sequence as the coding region of the nucleic acid or the corresponding strand of the adaptor. For an amplification reaction, generally one primer hybridizes to the sense strand and a second primer hybridizes to a sequence complementary to the sense strand.
The term xe2x80x9coligo-dV primerxe2x80x9d as used herein refers to a primer composed of non-deoxythymidine nucleotides, i.e. deoxyadenosine, deoxycytidine, and/or deoxyguanosine. The oligo-dV primers of the invention can consist of only one nucleotide (e.g., oligo-dC) or may be any combination of dA, dC or dG. The oligo-dV primers of the invention are preferably between 6 and 15 nucleotides in length, and more preferably from between 8 and 12 nucleotides in length.
The term xe2x80x9chybridizationxe2x80x9d as used herein, refers to the formation of a duplex structure by two single stranded nucleic acids due to complementary base pairing. Hybridization can occur between complementary nucleic acid strands or between nucleic acid strands that contain minor regions of mismatch. The melting temperature, or xe2x80x9cTmxe2x80x9d measures stability of a nucleic acid duplex. The Tm is the temperature (under defined ionic strength and pH) at which 50% of the base pairs have dissociated. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length of the nucleic acids, base composition and sequence, ionic strength, and incidence of mismatched base pairs.
The term xe2x80x9cstringent hybridization conditionsxe2x80x9d as used herein refers to conditions under which only fully complementary nucleic acid strands will hybridize. Stringent hybridization conditions are well known in the art (see, e.g., Sambrook et al., 1985, Molecular Cloningxe2x80x94A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y.). Generally, stringent conditions are selected to be about 5 C. lower than the Tm for the specific sequence at a defined ionic strength and pH. Typically, stringent conditions will be those in which the salt concentration is at least about 0.2 M at pH 7 and the temperature is at least about 60 C. Relaxing the stringency of the hybridizing conditions will allow sequence mismatches to be tolerated; the degree of mismatch tolerated can be controlled by suitable adjustment of the hybridization conditions.
The term xe2x80x9csubstantially complementaryxe2x80x9d as used herein refers to two single-stranded nucleic acids that are complementary except for minor regions of mismatch. Stable duplexes of substantially complementary sequences can be achieved under less stringent hybridization conditions. Those skilled in the art of nucleic acid technology can determine duplex stability empirically considering a number of variables including, for example, the length and base pair concentration of the nucleic acids, ionic strength, and incidence of mismatched base pairs.
The term xe2x80x9cfull-lengthxe2x80x9d mRNA as used herein refers to an mRNA that encodes the entire translation region of an mRNA, including promoter or enhancer regions 5xe2x80x2 of the translation start site. The term cam also used to encompass transcripts that comprise at least the start methionine of the coding region of an mRNA, i.e., a transcript that is comprised of the entire coding region of an mRNA.