The present invention relates to methods and compositions for DNA synthesis, and, more particularly, for the synthesis of complementary DNA in vivo.
The present invention is a tool for molecular biology. An introduction to the nomenclature of molecular biology, the structure of DNA, RNA and proteins and the interrelationships between these molecules, is provided in Chapter 4, Synthesis of Proteins and Nucleic Acids of Darnell et al., Molecular Cell Biology, Scientific American Books (1989). A more detailed treatment of these issues is set forth in the full text of Darnell et al., (1989) and in Lewin, Genes IV, Oxford University Press (1990).
Hereditary information is encoded in the genes of an organism. Genes are made of polymers of nucleic acids, usually deoxyribonucleic acid (DNA). DNA is composed of a series of four nucleotide bases; the hereditary information carried by a gene is encoded by the specific sequence of nucleotide bases in the DNA molecule. The genetic information within structural genes encodes proteins; the sequence and structure (and therefore function) of a particular protein is determined by the order of the nucleotide bases within the gene that encodes that protein. Proteins determine an organism""s identity; from cellular structures to the organism""s response to its environment. Thus, the genes that encode these proteins determine an organism""s identity.
The information encoded within a structural gene is xe2x80x9cexpressedxe2x80x9d by a cell through the processes of transcription and translation. Transcription results in the production of an intermediate carrier of the genetic code, termed messenger RNA (mRNA). Messenger RNA is effectively a copy of the gene; it is a polymer of ribonucleic acid (hence xe2x80x9cRNAxe2x80x9d) rather than of deoxyribonucleic acid.
In eukaryotic organisms (which are generally more complex organisms than bacteria), genes are made up of coding regions (termed xe2x80x9cexonsxe2x80x9d) and non-coding regions (termed xe2x80x9cintronsxe2x80x9d). Exons directly encode the protein sequence of the gene. Introns may be very large and there may be a large number of intron sequences within a particular gene. The role of the non-coding intron sequences is unclear. However, there is evidence that these intervening sequences serve two critical purposes: first they divide the exon coding regions into smaller protein coding units and so minimize the chances of errors during transcription and translation; second, they relegate discrete portions or cassettes of protein sequence to exon units which can be more easily shuffled during the course of evolution and therefore facilitate the development of new proteins which may ultimately enhance the survival of the species.
The transcription process involves the formation of an mRNA copy of the entire gene. That is, the mRNA produced by the transcription process contains a copy of both the non-coding intron sequences and the protein-encoding exon sequences. Thus the mRNA first produced by transcription is the same length as the gene from which it was copied. Subsequently, this immature mRNA undergoes a processing stage during which the non-coding intron sequences are spliced out. The resulting processed mRNA molecules thus contain only the information required to encode the protein (i.e. they contain copies of only the joined exon sequences). These processed mRNA molecules are thus considerably shorter in length than the xe2x80x9cgenomic sequencexe2x80x9d (the gene exons and introns as they exist in the chromosome) from which the mRNA was initially copied. The processed mRNA is also modified at this stage to include a polyriboadenylic acid, poly(A), tail at one end of the molecule (the 3xe2x80x2 end) and a xe2x80x9ccapxe2x80x9d structure at the other end of the molecule (the 5xe2x80x2 end) (standard nomenclature assigns one end of DNA and RNA molecules as the 5xe2x80x2 end and the other as the 3xe2x80x2 end, according to the terminal chemical groupings of the molecule). An mRNA molecule that has been processed to remove introns and has a 5xe2x80x2 cap and a 3xe2x80x2 poly(A) tail is termed a xe2x80x9cmaturexe2x80x9d mRNA molecule. A greatly simplified diagram of the transcription process, illustrating removal of the non-coding intron sequences is shown in FIG. 1.
The step of converting the information carried by the mature messenger RNA into a protein is termed translation. Translation is the final step of the means by which the information encoded by the nucleotide sequence within a structural gene is converted into a specific protein composed of a sequence of amino acids.
The cloning of genes became possible in the 1970""s. In early experiments, small genes were cloned from bacteria. Since that time advances in molecular biology and genetic engineering have developed at an extraordinary rate, such that the sequence of the entire human genome is now being determined. Despite rapid advances in the technology of this field, a number of limitations are still apparent. One of these is the difficulty of cloning very large structural genes.
The size of a gene is measured in the number of nucleotide bases that it contains, usually expressed in terms of thousands of bases (kilobases or Kb). Although there are several examples of larger genes, the total coding sequence of most structural genes (the exons) typically totals 1-10 Kb. However, the presence of multiple large intron sequences between the exon segments means that at the genomic level these genes are spread out over a much larger area, frequently spanning tens or even hundreds of kilobases. Present gene cloning vectors such as YACs (Yeast Artificial Chromosomes) allow the cloning of very large (100-300 Kb) genomic segments; however, these genomic inserts include the noncoding intron sequences, which precludes the expression of protein in an artificial system. A partial genetic sequence, or sequence containing introns, results in the expression of a nonfunctional, truncated protein, or, when the sequence for the 5xe2x80x2 translation start site is missing, results in expression of a unrelated garbled protein sequence. Even if a partial gene may be identified through a screening process, it is then necessary to recover the remaining portions of the gene. This can be an extremely complicated process. If the gene contains many intron sequences, and is thus large, years of effort can be expended in attempting to recover the remaining pieces of the gene. Additional effort may then be required to determine the relative order of the gene fragments and to distinguish exon from intron sequences. The ability to clone a gene as a contiguous protein coding cassette is particularly important where identification of the gene is achieved by means of a detection technique which relies on production of the protein in a recombinant bacterial or viral system and xe2x80x9cscreeningxe2x80x9d for the function or structure of the desired proteinxe2x80x94a common technique of detecting cloned genes.
To clone structural genes, molecular biologists have taken advantage of the cellular mRNA processing function described above whereby intron sequences are spliced out of the immature mRNA to produce a mature mRNA that is considerably smaller that the original gene. By converting the mature mRNA molecule back into a DNA molecule (hence the term, xe2x80x9creverse transcriptionxe2x80x9d), one can obtain the original coding sequence (the exons) without the extraneous intron sequences. Such a DNA molecule is termed a complementary DNA because it is complementary to the mRNA molecule from which it was derived. Complementary DNA (cDNA) synthesis is the preferred technique for gene cloning because it results in the recovery of the desired gene in a relatively small, contiguous protein coding cassette amenable to recombinant protein production.
An additional and important use of cDNA technology is to identify those genes that are being expressed by a cell at a particular time. Gene expression requires substantial energy expenditure on the part of the cell, and mRNA molecules are designed to be short-lived xe2x80x9cprotein requestsxe2x80x9d; therefore, with a few exceptions (notably in the egg during development), only those genes that code for proteins which are immediately needed are transcribed into mRNA. By making cDNA copies of the existing mRNA population in a cell, and cloning the cDNAs produced, researchers are able to produce a cDNA library from the genes which were being expressed at that time. Researchers can thus determine specifically which genes are expressed in a given tissue type, at a given stage of development, or in response to an applied stimulus.
Complementary DNA clones are extremely important in both research and industry. Research requires expression of the cloned gene in order to determine the protein""s function and structure. In addition, large amounts of protein are required for the production of polyclonal or monoclonal antibodies which are indispensable for following small amounts of the protein through research protocols, and in determining the location of the protein in the cell. Bacteria are commonly used as hosts in which a cloned gene is expressed. The genes of prokaryotes, including bacteria, do not contain introns, and so these cells do not have the splicing machinery necessary to process immature mRNA into a mature mRNA that can be translated into a functional protein. Genomic clones of eukaryotic genes (i.e., containing introns and exons) can not be expressed in a bacterial host, whereas a cDNA copy of the same gene can be expressedxe2x80x94either in procaryotes or eukaryotes. Thus, cDNA clones are routinely used for large scale protein production. This artificial protein expression is termed xe2x80x9crecombinant proteinxe2x80x9d production and is an increasingly common way of producing many of the pharmaceuticals which for years were accessible in small amounts by tedious extraction from other animal""s tissues.
Techniques presently used for cDNA synthesis are reviewed in Berger and Kimmel, Guide to Molecular Cloning Techniques in Methods in Enzymology Volume 152, Academic Press Inc. (1987), in Sambrook et al., Molecular Cloning, Cold Spring Harbor Laboratory Press (1989), and in Okayama, H., et al., Meth. Enzymol. 159:3-27 (1987). A review of mRNA isolation techniques is presented in Chapters 7 and 8 of Sambrook et al. (1989).
Isolation of mRNA is a long, tedious process with a number of technically difficult steps. In summary, a typical procedure for isolating mRNA from a cell requires (1) disruption of cells to release cellular contents, (2) isolation of total RNA from the cell, (3) selection of the mRNA population by running the extracted RNA through an oligo(dT) cellulose column and (4) size fractionation of the isolated mRNA. At all stages, great care is required to ensure that the preparation does not come into contact with active ribonuclease enzymes which can destroy the RNA. Because the goal of the cDNA cloning procedure is to obtain xe2x80x9cfull lengthxe2x80x9d cDNA clones that contain the entire coding sequence of the gene, it is extremely important to use procedures that maintain the integrity of the mRNA. Ribonuclease (RNAse) enzymes are very stable and so even a very small amount of the active enzyme in an mRNA preparation will cause problems. RNAse is present on virtually all surfaces, including human skin, and is thus very easily introduced into the RNA preparation. To avoid contamination problems, all solutions, glassware and plasticware must be specially treated. The cells from which the mRNA is to be isolated are disrupted in solutions which are extremely harsh and contain components which immediately inactivate the omnipresent ribonuclease enzymes; all subsequent solutions used in RNA preparation are treated with diethylpyrocarbonate (DEPC), a suspected carcinogen) which inactivates RNAse. Often a laboratory will set aside particular equipment and work space that is designated to be xe2x80x9cribonuclease freexe2x80x9d. The potential for RNA degradation starts at the first step of breaking open the cells (the cells themselves contain ribonucleases which, upon lysis of the cells, come into contact with the RNA), and continue throughout the procedure.
Total RNA extracted from a cell is made up of messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA). The mRNA typically makes up only 1-3% of the total cellular RNA (approximately 1xc3x9710xe2x88x9212 g mRNA per eukaryotic cell, Sargent, T. D., Methods Enzymol. 152:423-432 (1987)). Most cDNA synthesis reactions rely on the presence of the poly(A) tail present only in mature mRNA transcripts. The mature RNA transcripts are selectively extracted from the bulk of the cellular RNA, usually by affinity chromatography. This is an essential step for successful in vitro cDNA synthesis; failure to enrich for the mature mRNA will result in a low yield of poor quality cDNA.
As a final stage prior to cDNA synthesis, the mRNA preparation may be size selected. This is usually performed to remove the smaller size molecules (usually degraded forms of larger mRNAs) which would otherwise interfere with the cDNA cloning procedure. Size selection may also be performed to enrich for an mRNA species of known size. Size selection may be performed by electrophoresis through agarose gels, by column chromatography, or by sedimentation through sucrose gradients. These techniques result in lower yields, and may require the presence of methylmercuric hydroxide to disrupt secondary intramolecular structure. Methylmercuric hydroxide is extremely toxic and volatile, and requires great care in handling. Safer alternatives (such as the use of gels containing glyoxal/dimethyl sulfoxide or formaldehyde) are available, but these techniques also involve dangerous chemicals and have associated disadvantages.
Additional disadvantages arise from the necessity of extracting mRNA from cells prior to cDNA synthesis. For example, cDNA cloning is often used to assess which genes are expressed in a cell under particular conditions or at a particular stage in the development of the organism. The time and conditions required to extract the mRNA may themselves produce alterations in the gene expression pattern of the cell. Furthermore, mRNA molecules which are present in very low abundance (estimated at 20 copies per cell) or which are unstable may be lost during the RNA isolation procedure. There is currently a lower limit on the number of cells necessary to produce a cDNA library due to the inherent losses incurred in mRNA isolation procedures. This invention addresses this problem by completely circumventing the initial mRNA isolation requirement.
Following extraction and purification of the mRNA, cDNA synthesis is performed in vitro. All methodologies presently used for cDNA synthesis follow mRNA extraction and purification, or are performed on dead cells under in vitro conditions. These methodologies are reviewed in detail in Kimmel and Berger (1987); Okayama, H., et al., (1987); and Embleton, M. J., et al., Nucleic Acids Res. 20:3831-3837 (1992)).
All of the presently available techniques utilize RNA-dependent DNA polymerase enzymes (more commonly termed reverse transcriptase enzymes) to synthesize the first strand of the cDNA from the mRNA template. Reverse transcriptase, like other DNA polymerases, cannot initiate nucleic acid synthesis de novo. Rather, it adds the first nucleotide of the nascent cDNA strand to the hydroxyl group at the 3xe2x80x2 end of a preexisting RNA or DNA strand that is annealed to the mRNA template. This preexisting strand to which the enzyme adds the first nucleotide is called a primer and appears to be absolutely required for reverse transcriptase activity. In their natural role, reverse transcriptase enzymes enter the cell within the infecting retrovirus. The enzyme is already associated with both a specific cellular transfer RNA (tRNA) molecule (present in all host cells), which the virus scavenged from a host cell during the previous infection, and a viral RNA chromosome. Transfer RNA molecules are short (70 to 80 nucleotides long) RNA molecules which are folded into complex three-dimensional structures; their usual cellular role is in the translation process (Darnell et al. (1989), chapter 4). After the virus enters a newly infected cell the reverse transcriptase-tRNA complex acts as primer for an in vivo (in the cell) reverse transcriptase reaction, using the viral RNA molecule, to which this complex is already bound, as the RNA template. The cDNA made from this in vivo reaction is then converted to double-stranded DNA by the same reverse transcriptase enzyme complex, and integrated into a chromosome of the infected cell. It is important to note that the reverse transcriptase-tRNA primer complex necessary for viral replication will not act as a primer for cDNA synthesis from cellular mRNA templates because the tRNA species (2 of about 40 types present in the cell) which have affinity for the reverse transcriptase enzyme are complementary to, and will therefore only prime from, sequences present on specific RNA molecules. It is interesting to note, however, that there are viruslike 30S (VL30) elements present in mouse cells which have regions of strong homology to retroviruses. These elements have properties of defective type C viruses, are reverse transcribed and are packaged into retroviral virions (Howk, R. S., et al., J. Virol. 25:115-123 (1978); Besmer, P. U., et al., J. Virol. 29:1168-1176 (1979)).
When reverse transcriptase enzymes are utilized for in vitro (outside the cellular environment) cDNA synthesis, the requirement of the enzyme for a primer molecule is usually satisfied by the inclusion of a oligodeoxynucleotide primer(s) in the reaction mixture. Most commonly, the primer is an oligomer of deoxyribothymidylic acid, oligo(dT). This primer is complementary to the poly(A) tail located at the 3xe2x80x2 and of the mRNA molecule (xe2x80x9cAxe2x80x9d nucleotides are complementary to, an anneal to xe2x80x9cTxe2x80x9d or xe2x80x9cUxe2x80x9d nucleotides). Thus, this oligo(dT) primer molecule anneals to the poly(A) tail region and serves as a primer for the reverse transcriptase enzyme. Alternatively, in vitro cDNA synthesis may utilize an oligonucleotide primer that is complementary to other sequences within the RNA molecule; however, because of the extensive stretch of complementary nucleotides necessary for annealing to occur, such a primer will be xe2x80x9csequence specificxe2x80x9d for the mRNA molecule to which it is designed to anneal. Synthesis of such a sequence specific primer requires prior knowledge of the nucleotide sequence of part of the mRNA. The primer requirements of reverse transcriptase enzymes are discussed in Chapter 5 of Sambrook, et al. (1989).
The product of the initial reverse transcriptase reaction in vitro is a single-stranded complementary DNA copy of the mRNA molecule. This reaction is often referred to as xe2x80x9cfirst strand cDNA synthesis.xe2x80x9d Thereafter, various techniques are used to generate the second strand of the cDNA. The resultant double-stranded DNA (dsDNA) molecules are then modified at the ends, and inserted into a xe2x80x9cvectorxe2x80x9d which allows growth, selection, and amplification of each copy. Most commonly used techniques (eg. Okayama and Berg, Molecular and Cellular Biology 2:161-170 (1982)) may be summarized as follows: Following extraction and purification of the mRNA and in vitro reverse transcription of the mRNA to produce single-stranded cDNA molecules, the mRNA template is eliminated to allow synthesis of the second strand of DNA and thereby form a double-stranded cDNA molecule; specific DNA linkers are then attached to the blunted end of the double-stranded cDNA,and the cDNA is ligated into a suitable cloning vector.
In all presently used techniques, the reverse transcriptase-catalyzed step of making a cDNA copy of the mRNA is always performed under in vitro conditions. The quality of the cDNA synthesis (that is, the ability to generate both accurate and full-length complementary DNA) depends upon the fidelity and the processivity of the enzyme chosen, and the conditions under which the reaction is performed. Clearly less than full-length cDNA is not acceptable, and a high error rate will compromise the utility of the cDNA produced. The use of the reverse transcriptase in vitro, rather than under the in vivo conditions which the enzyme has evolved to function, appears to adversely affect both the fidelity and processivity of the enzyme. The in vitro fidelity of MuLV reverse transcriptase has been estimated to be 10xe2x88x924 (i.e. one wrong nucleotide per 10,000 bases or 10 errors per 100 kb), and recent studies have determined that the in vivo fidelity is approximately 2xc3x9710xe2x88x925 (1 error for every 50,000 bases copied, 2 errors per 100 kb; Mont et al., J. Virol. 66:3683-3689 (1992)). In addition, it is difficult to obtain full length first strand synthesis in an artificial environment whereas the processivity of the enzyme in the in vivo cDNA synthesis reactions is excellent; with cDNA incorporation extending well past the 10 kb range (see included data). While conditions have been developed to optimize the performance of reverse transcriptase enzymes in vitro, these conditions do lead to a certain frequency of errors, and premature termination of first strand cDNA synthesis. It is clear that the in vitro conditions do not reflect the optimal conditions for the enzyme.
Thus, present techniques for cDNA synthesis are limited by (1) the requirement that the mRNA be extracted and purified from cells and (2) the performance of the reverse transcriptase enzyme under in vitro conditions. In combination, these factors limit: the ease of cDNA synthesis; the efficiency of cDNA synthesis; the size of cDNA molecules that can be produced (thereby the genes that are readily clonable by this technique); the accuracy of cDNA synthesis in determining which genes are expressed under particular conditions; and the fidelity of the cDNA produced.
It is an object of the present invention to provide a technique of cDNA synthesis that does not require the isolation of mRNA molecules from cells.
It is a further object of the present invention to provide a technique of cDNA synthesis that does not require in vitro activity of reverse transcriptase.
It is an additional object of the present invention to provide a technique of cDNA synthesis wherein the efficiency of the technique, the fidelity of the cDNA produced and the size of cDNA that the technique is capable of producing are superior to all presently used techniques.
The present invention relates to methods and compositions for the synthesis of complementary DNA copies of RNA templates in vivo.
In accordance with the present invention, a method for synthesizing a complementary DNA copy of an RNA template molecule is provided. The method comprises providing a polynucleotide molecule which is capable of annealing in vivo to an RNA template molecule, providing at least one reverse transcriptase enzyme which is capable of initiating DNA synthesis using the polynucleotide molecule as a primer, introducing the polynucleotide molecule into a viable target cell in the presence of the reverse transcriptase enzyme and incubating the target cell under conditions which permit the synthesis of a DNA molecule complementary to the RNA template molecule.
A further aspect of the invention provides a method for producing in vivo a complementary DNA copy of an RNA template molecule of which at least a partial sequence is known. This method comprises providing a DNA molecule comprising a sequence which encodes a promoter operatively linked to a sequence which encodes a polynucleotide molecule which is capable of annealing in vivo to an RNA template molecule and further capable of functioning as a primer for at least one reverse transcriptase enzyme; providing a first nucleotide primer wherein said primer is homologous to a sequence located 5xe2x80x2 to the end of the promoter sequence, and providing a second nucleotide primer wherein said primer comprises a 3xe2x80x2 sequence that is complementary to a 3xe2x80x2 region of the polynucleotide molecule joined to a 5xe2x80x2 sequence that is complementary to a portion of the known RNA template molecule sequence. Thereafter, the DNA molecule is contacted with the first and second primers, and the mixture is then treated under conditions and with reagents suitable for amplifying the cloned DNA molecule. At least one amplified DNA molecule produced thereby is then treated under conditions and with reagents suitable for production of an encoded RNA molecule, which is introduced into a viable target cell in the presence of at least one reverse transcriptase enzyme which is capable of initiating DNA synthesis using said RNA molecule as a primer. The target cell is then incubated under conditions such that a DNA molecule complementary to at least one RNA template molecule is produced.
Also provided in accordance with the present invention are compositions useful in the practice of the present method. Such compositions include polynucleotide molecules which are capable of annealing in vivo to an RNA template molecule and further capable of functioning as primer molecules for at least one reverse transcriptase enzyme, together with DNA molecules and recombinant DNA vectors encoding such polynucleotide molecules, and kits containing such compositions.