cDNA and cDNA Libraries
In examining the structure and physiology of an organism, tissue or cell, it is often desirable to determine its genetic content. The genetic framework of an organism is encoded in the double-stranded sequence of nucleotide bases in the deoxyribonucleic acid (DNA) which is contained in the somatic and germ cells of the organism. The genetic content of a particular segment of DNA, or gene, is only manifested upon production of the protein which the gene encodes. In order to produce a protein, a complementary copy of one strand of the DNA double helix (the “coding” strand) is produced by polymerase enzymes, resulting in a specific sequence of ribonucleic acid (RNA). This particular type of RNA, since it contains the genetic message from the DNA for production of a protein, is called messenger RNA (mRNA).
Within a given cell, tissue or organism, there exist myriad mRNA species, each encoding a separate and specific protein. This fact provides a powerful tool to investigators interested in studying genetic expression in a tissue or cell—mRNA molecules may be isolated and further manipulated by various molecular biological techniques, thereby allowing the elucidation of the full functional genetic content of a cell, tissue or organism.
One common approach to the study of gene expression is the production of complementary DNA (cDNA) clones. In this technique, the mRNA molecules from an organism are isolated from an extract of the cells or tissues of the organism. This isolation often employs solid chromatography matrices, such as cellulose or agarose, to which oligomers of thymidine (T) have been complexed. Since the 3′ termini on most eukaryotic mRNA molecules contain a string of adenosine (A) bases, and since A binds to T, the mRNA molecules can be rapidly purified from other molecules and substances in the tissue or cell extract. From these purified mRNA molecules, cDNA copies may be made using the enzyme reverse transcriptase (RT), which results in the production of single-stranded cDNA molecules. The single-stranded cDNAs may then be converted into a complete double-stranded DNA copy (i.e., a double-stranded cDNA) of the original mRNA (and thus of the original double-stranded DNA sequence, encoding this mRNA, contained in the genome of the organism) by the action of a DNA polymerase. The protein-specific double-stranded cDNAs can then be inserted into a plasmid or viral vector, which is then introduced into a host bacterial, yeast, animal or plant cell. The host cells are then grown in culture media, resulting in a population of host cells containing (or in many cases, expressing) the gene of interest.
This entire process, from isolation of mRNA to insertion of the cDNA into a plasmid or vector to growth of host cell populations containing the isolated gene, is termed “cDNA cloning.” If cDNAs are prepared from a number of different mRNAs, the resulting set of cDNAs is called a “cDNA library,” an appropriate term since the set of cDNAs represents a “population” of genes comprising the functional genetic information present in the source cell, tissue or organism. Genotypic analysis of these cDNA libraries can yield much information on the structure and function of the organisms from which they were derived.
Retroviral Reverse Transcriptase Enzymes
Three prototypical forms of retroviral RT have been studied thoroughly. Moloney Murine Leukemia Virus (M-MLV) RT contains a single subunit of 78 kDa with RNA-dependent DNA polymerase and RNase H activity. This enzyme has been cloned and expressed in a fully active form in E. coli (reviewed in Prasad, V. R., Reverse Transcriptase, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press, p. 135 (1993)). Human Immunodeficiency Virus (HIV) RT is a heterodimer of p66 and p51 subunits in which the smaller subunit is derived from the larger by proteolytic cleavage. The p66 subunit has both a RNA-dependent DNA polymerase and an RNase H domain, while the p51 subunit has only a DNA polymerase domain. Active HIV p66/p51 RT has been cloned and expressed successfully in a number of expression hosts, including E. coli (reviewed in Le Grice, S. F. J., Reverse Transcriptase, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory press, p. 163 (1993)). Within the HIV p66/p51 heterodimer, the 51-kD subunit is catalytically inactive, and the 66-kD subunit has both DNA polymerase and RNase H activity (Le Grice, S. F. J., et al., EMBO Journal 10:3905 (1991); Hostomsky, Z., et al., J. Virol. 66:3179 (1992)). Avian Sarcoma-Leukosis Virus (ASLV) RT, which includes but is not limited to Rous Sarcoma Virus (RSV ) RT, Avian Myeloblastosis Virus (AMT, Avian Erythroblastosis Virus (AEV) Helper Virus MCAV RT, Avian Myelocytomatosis Virus MC29 Helper Virus MCAV RT, Avian Reticuloendotheliosis Virus (REV-T) Helper Virus REV-A RT, Avian Sarcoma Virus UR2 Helper Virus UR2AV RT, Avian Sarcoma Virus Y73 Helper Virus YAV RT, Rous Associated Virus (RAV) RT, and Myeloblastosis Associated Virus (MAV) RT, is also a heterodimer of two subunits, α (approximately 62 kDa) and β (approximately 94 kDa), in which α is derived from β by proteolytic cleavage (reviewed in Prasad, V. R., Reverse Transcriptase, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press (1993), p. 135). ASLV RT can exist in two additional catalytically active structural forms, ββ and α (Hizi, A. and Joklik, W. K., J. Biol. Chem. 252: 2281 (1977)). Sedimentation analysis suggests αβ and ββ are dimers and that the α form exists in an equilibrium between monomeric and dimeric forms (Grandgenett, D. P., et al., Proc. Nat. Acad. Sci. USA 70: 230 (1973); Hizi, A. and Joklik, W. K., J. Biol. Chem. 252: 2281 (1977); and Soltis, D. A. and Skalka, A. M., Proc. Nat. Acad. Sci. USA 85: 3372 (1988)). The ASLV αβ and ββ RTs are the only known examples of retroviral RT that include three different activities in the same protein complex: DNA polymerase, RNase H, and DNA endonuclease (integrase) activities (reviewed in Skalka, A. M., Reverse Transcriptase, Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press (1993), p. 193). The α form lacks the integrase domain and activity.
Various forms of the individual subunits of ASLV RT have been cloned and expressed. These include a 98-kDa precursor polypeptide that is normally processed proteolytically to β and a 4-kDa polypeptide removed from the β carboxy end (Alexander, F., et_al., J. Virol. 61: 534 (1987) and Anderson, D. et al., Focus 17:53 (1995)), and the mature β subunit (Weis, J. H. and Salstrom, J. S., U.S. Pat. No. 4,663,290 (1987); and Soltis, D. A. and Skalka, A. M., Proc. Nat. Acad. Sci. USA 85:3372 (1988)). Heterodimeric RSV αβ RT has also been purified from E. coli cells expressing a cloned RSV β gene (Chernov, A. P., et al., Biomed. Sci. 2:49 (1991)). However, there have been no reports heretofore of the simultaneous expression of cloned ASLV RT α and β genes resulting in the formation of heterodimeric αβ RT.
Reverse Transcription Efficiency
As noted above, the conversion of mRNA into cDNA by RT-mediated reverse transcription is an essential step in the study of proteins expressed from cloned genes. However, the use of unmodified RT to catalyze reverse transcription is inefficient for at least two reasons. First, RT sometimes destroys an RNA template before reverse transcription is initiated, primarily due to the activity of intrinsic RNase H activity present in RT. Second, RT often fails to complete reverse transcription after the process has been initiated (Berger, S. L., et al., Biochemistry 22:2365–2372 (1983); Krug, M. S., and Berger, S. L., Meth. Enzymol. 152:316 (1987)). Removal of the RNase H activity of RT can eliminate the first problem and improve the efficiency of reverse transcription (Gerard, G. F., et al., FOCUS 11(4):60 (1989); Gerard, G. F., et al., FOCUS 14(3):91 (1992)). However RTs, including those forms lacking RNase H activity (“RNase H−” forms), still tend to terminate DNA synthesis prematurely at certain secondary structural (Gerard, G. F., et al., FOCUS 11(4):60 (1989); Myers, T. W., and Gelfand, D. H., Biochemistry 30:7661(1991)) and sequence (Messer, L. I., et al., Virol. 146:146 (1985)); Abbotts, J., et al., J. Biol. Chem. 268:10312–10323 (1993)) barriers in nucleic acid templates.
Even in the most efficient reverse transcription systems available today, which use RNase H− M-MLV RT, yields of total cDNA product generally do not exceed 50% of input mRNA and the fraction of the product that is full-length does not exceed 50%. The secondary structural and sequence barriers in the mRNA template, which as described above can give rise to these limitations, occur frequently at homopolymer stretches (Messer, L. I., et al., Virol. 146:146 (1985); Huber, H. E., et al., J. Biol. Chem. 264:4669–4678 (1989); Myers, T. W., and Gelfand, D. H., Biochemistry 30:7661 (1991)), are more often sequence rather than secondary structural barriers (Abbotts, J., et al., J. Biol. Chem. 268:10312–10323 (1993)), and are often distinct for different RTs (Abbotts, J., et al., J. Biol. Chem. 268:10312–10323 (1993)). If these barriers could be overcome, yield of total and full-length cDNA product in reverse transcription reactions could be increased.