In examining the structure and physiology of an organism, tissue or cell, it often is desirable to determine its genetic content. The genetic framework of an organism is encoded in the double-stranded sequence of nucleotide bases in the deoxyribonucleic acid (DNA) which is contained in the somatic and germ cells of the organism. The genetic content of a particular segment of DNA, or gene, is manifested only upon production of the protein encoded by the gene. To produce a protein, a complementary copy of one strand of the DNA double helix (the “coding” strand) is produced by polymerase enzymes, resulting in a specific sequence of ribonucleic acid (RNA). This particular type of RNA, since it contains the genetic message from the DNA for production of a protein, is called messenger RNA (mRNA).
Within a given cell, tissue or organism, there exist many mRNA species, each encoding a separate and specific protein. This fact provides a powerful tool to investigators interested in studying genetic expression in a tissue or cell. mRNA molecules may be isolated and further manipulated by various molecular biological techniques, thereby allowing the elucidation of the full functional genetic content of a cell, tissue or organism. The identity and levels of specific mRNAs present in a particular sample provides clues to the biology of the particular tissue or sample being studied. Therefore, the detection, analysis, transcription, and amplification of RNAs are among the most important procedures in modern molecular biology.
A common approach to the study of gene expression is the production of complementary DNA (cDNA). In this technique, the mRNA molecules from an organism are isolated from an extract of the cells or tissues of the organism. From these purified mRNA molecules, cDNA copies may be made using the enzyme reverse transcriptase (RT) or DNA polymerases having RT activity, which results in the production of single-stranded cDNA molecules. The term “reverse transcriptase” describes a class of polymerases characterized as RNA dependent DNA polymerases. All known reverse transcriptases require a primer to synthesize a DNA transcript from an RNA template.
Avian myoblastosis virus (AMV) reverse transcriptase was the first widely used RNA dependent DNA polymerase (Verma, Biochem. Biophys. Acta 473:1(1977)). The enzyme has 5′-3′ RNA directed DNA polymerase activity, 5′-3′ DNA directed DNA polymerase activity, and RNase H activity. RNase H is a processive 5′ and 3′ ribonuclease specific for the RNA strand for RNA DNA hybrids (Perbal, A Practical Guide to Molecular Cloning, New York: Wiley & Sons (1984)). Errors in transcription cannot be corrected by reverse transcriptase because known viral reverse transcriptases lack the 3′-5′ exonuclease activity necessary for proofreading (Saunders and Saunders, Microbial Genetics Applied to Biotechnology, London: Croom Helm (1987)). A detailed study of the activity of AMV reverse transcriptase and its associated RNase H activity has been presented by Berger et al., Biochemistry 22:2365 2372 (1983).
Another reverse transcriptase which is used extensively in molecular biology is reverse transcriptase originating from Moloney murine leukemia virus (M-MLV). See, e.g., Gerard, G. R., DNA 5:271 279 (1986) and Kotewicz, M. L., et al., Gene 35:249 258 (1985). M-MLV reverse transcriptase substantially lacking in RNase H activity has also been described. See, e.g., U.S. Pat. No. 5,244,797.
Historically, reverse transcriptase has been used primarily to transcribe mRNA into cDNA which can then be cloned into a vector for further manipulation. The single-stranded cDNAs may be converted into a complete double-stranded DNA copy (i.e., a double-stranded cDNA) of the original mRNA (and thus of the original double-stranded DNA sequence, encoding this mRNA, contained in the genome of the organism) by the action of a DNA polymerase. The double stranded cDNAs can then be inserted into a vector, transformed into an appropriate bacterial, yeast, animal or plant cell host, and propagated as a population of host cells containing a collection of cDNA clones, or cDNA library, that represents the genes, or portions of genes present in the original mRNA sample.
Alternatively, cDNA can be labeled with an appropriate reporter moiety and used as hybridization probe to query defined target sequences immobilized on glass slides, filters, or other suitable solid supports. The identity and relative abundance of a given mRNA in a sample can be inferred from the signal intensity for a specific target sequence on the solid support.
One of the most widely used techniques to study gene expression exploits first-strand cDNA for mRNA sequence(s) as template for amplification by the polymerase chain reaction, PCR. This method, often referred to as RNA PCR or reverse transcriptase PCR (RT-PCR), exploits the high sensitivity and specificity of the PCR process and is widely used for detection and quantification of RNA. Recently, the ability to measure the kinetics of a PCR reaction by on-line detection in combination with these RT-PCR techniques has enabled accurate and precise measurement of RNA sequences with high sensitivity. This has become possible by detecting the RT-PCR product through fluorescence monitoring and measurement of PCR product during the amplification process by fluorescent dual-labeled hybridization probe technologies, such as the “TaqMan” 5′ fluorogenic nuclease assay described by Holland et al. (Proc. Natl. Acad. Sci. U.S.A. 88, 7276 (1991)), Gibson et al. (Genome Res. 6, 99 (1996)), and Heid et al. (Genome Res. 6, 986 (1996)); or “Molecular Beacons” (Tyagi, S. and Kramer, F. R. Nature Biotechnology 14, 303 (1996)). Nazarenko et al. (Nucleic. Acids Res. 25, 2516 (1997)) have described use of dual-labeled hairpin primers, as well as recent modifications utilizing primers labeled with only a single fluorophore (Nazerenko et al., Nucleic. Acids Res. (2002)). One of the more widely used methods is the addition of double-strand DNA-specific fluorescent dyes to the reaction such as: ethidium bromide (Higuchi et al., Biotechnology (1992) and Higuchi et al., Biotechnology 11, 102610, 413 (1993)), YO-PRO-1 (Ishiguro et al., Anal. Biochem. 229, 207 (1995)), or SYBR Green I (Wittwer et al., Biotechniques 22,130 (1997)). These improvements in the PCR method have enabled simultaneous amplification and homogeneous detection of the amplified nucleic acid without purification of PCR product or separation by gel electrophoresis. This combined approach decreases sample handling, saves time, and greatly reduces the risk of product contamination for subsequent reactions, as there is no need to remove the samples from their closed containers for further analysis. The concept of combining amplification with product analysis has become known as “real time” PCR, also referred to as quantitative PCR, or qPCR.
The general principals for template quantification by real-time PCR were first disclosed by Higuchi R, G Dollinger, P S Walsh and R. Griffith, “Simultaneous amplification and detection of specific DNA sequences”, Bio/Technology 10:413-417, 1992; Higuchi R, C Fockler G Dollinger and R Watson, Kinetic PCR analysis: real time monitoring of DNA amplification reactions, Bio/Technology 111:1026-1030. This simpler approach for quantitative PCR utilizes a double-strand specific fluorescent dye, ethidium bromide, added to amplification reaction. The fluorescent signal generated at each cycle of PCR is proportional to the amount of PCR product. A plot of fluorescence versus cycle number is used to describe the kinetics of amplification and a fluorescence threshold level was used to define a fractional cycle number related to initial template concentration. Specifically, the log of the initial template concentration is inversely proportional to the fractional cycle number (threshold cycle, or Ct), defined as the intersection of the fluorescence versus cycle number curve with the fluorescence threshold. Higher amounts of starting template results in PCR detection at a lower Ct value, whereas lower amounts require a greater number of PCR cycles to achieve an equivalent fluorescent threshold (Ct) and are detected at higher Ct values. Typically, the setting of this fluorescence threshold is defined as a level that represents a statistically significant increase over background fluorescent noise. Since this occurs at an early stage in the PCR process when critical substrates are not limiting, quantification of starting template occurs over a broad dynamic range with high accuracy, precision, and sensitivity. A major problem in understanding of gene expression patterns for gene discovery and identification of metabolic pathways is the limitations of current methods for accurate quantification. Use of real time PCR methods provides a significant improvement towards this goal. However, real-time PCR quantification of mRNA is still bounded by limitations of the process of reverse transcription.
The RT-PCR procedure, carried out as either an end-point or real-time assay, involves two separate molecular syntheses: (i) the synthesis of cDNA from an RNA template; and (ii) the replication of the newly synthesized cDNA through PCR amplification. To attempt to address the technical problems often associated with RT-PCR, a number of protocols have been developed taking into account the three basic steps of the procedure: (a) the denaturation of RNA and the hybridization of reverse primer; (b) the synthesis of cDNA; and (c) PCR amplification. In the so called “uncoupled” RT-PCR procedure (e.g., two step RT-PCR), reverse transcription is performed as an independent step using the optimal buffer condition for reverse transcriptase activity. Following cDNA synthesis, the reaction is diluted to decrease MgCl2, and deoxyribonucleoside triphosphate (dNTP) concentrations to conditions optimal for Taq DNA Polymerase activity, and PCR is carried out according to standard conditions (see U.S. Pat. Nos. 4,683,195 and 4,683,202). By contrast, “coupled” RT PCR methods use a common or compromised buffer for reverse transcriptase and Taq DNA Polymerase activities. In one version, the annealing of reverse primer is a separate step preceding the addition of enzymes, which are then added to the single reaction vessel. In another version, the reverse transcriptase activity is a component of the thermostable Tth DNA polymerase. Annealing and cDNA synthesis are performed in the presence of Mn++ then PCR is carried out in the presence of Mg++ after the removal of Mn++ by a chelating agent. Finally, the “continuous” method (e.g., one step RT-PCR) integrates the three RT-PCR steps into a single continuous reaction that avoids the opening of the reaction tube for component or enzyme addition. Continuous RT-PCR has been described as a single enzyme system using the reverse transcriptase activity of thermostable Taq DNA Polymerase and Tth polymerase and as a two enzyme system using AMV RT and Taq DNA Polymerase wherein the initial 65° C. RNA denaturation step was omitted.
One step RT-PCR provides several advantages over uncoupled RT-PCR. One step RT-PCR requires less handling of the reaction mixture reagents and nucleic acid products than uncoupled RT-PCR (e.g., opening of the reaction tube for component or enzyme addition in between the two reaction steps), and is therefore less labor intensive, reducing the required number of person hours. One step RT-PCR also requires less sample, and reduces the risk of contamination (Sellner and Turbett, 1998). The sensitivity and specificity of one-step RT-PCR has proven well suited for studying expression levels of one to several genes in a given sample or the detection of pathogen RNA. Typically, this procedure has been limited to use of gene-specific primers to initiate cDNA synthesis.
In contrast, use of non-specific primer in the “uncoupled” RT-PCR procedure provides opportunity to capture all RNA sequences in a sample into first-strand cDNA, thus enabling the profiling and quantitative measurement of many different sequences in a sample, each by a separate PCR. The ability to increase the total amount of cDNA produced, and more particularly to produce cDNA that truly represents the mRNA population of the sample would provide a significant advance in study of gene expression. Specifically, such advances would greatly improve the probability of identifying genes which are responsible for disease in various tissues.
Ideally, synthesis of a cDNA molecule initiates at or near the 3′-termini of the mRNA molecules and terminates at the mRNA 5′-end, thereby generating “full-length” cDNA. Priming of cDNA synthesis at the 3′-termini at the poly A tail using an oligo dT primer ensures that the 3′-message of the mRNAs will be represented in the cDNA molecules produced. It would be very desirable if cDNA synthesis initiated at 3′ end and continued to the 5′-end of mRNA's regardless of length of mRNA and the reverse transcriptase used. However, due to many factors such as length, nucleotide sequence composition, secondary structure of mRNA and also inadequate processivity of reverse transcriptases, cDNA synthesis prematurely terminates resulting in non-quantitative representation of different regions of mRNA (i.e. 3′-end sequences or 5′-end sequences). It has been demonstrated that use of mutant reverse transcriptases lacking RNase H activity result in longer cDNA synthesis and better representation, and higher sensitivity of detection. However, it is generally believed that using oligo dT primer results in cDNA sequence bias of mRNA 3′-end region.
In studies involving quantitative analysis of gene expression, sequence bias in the cDNA and non-quantitative representation of different parts of mRNA can yield inaccurate expression data. Due to these problems an alternative method of priming for cDNA synthesis has been used utilizing random primers. Due to random sequence, these primers are believed to non-specifically prime CDNA synthesis at arbitrary sites along the mRNA resulting shorter cDNA fragments that collectively represent all parts of mRNA in the cDNA population. Gerard and D'Alessio (1993 Methods in Molecular Biology 16:73-93) have reported that the ratio of random primer to mRNA is critical for efficient cDNA synthesis by M-MLV RT or its RNase H deficient derivatives. Increasing concentrations of random hexamer resulted in increased yields of cDNA, however the average length of cDNA decreased accordingly. At equal hexamer concentrations, use of RNase H− RT resulted in cDNA yields that were approximately 4 fold higher than that obtained with M-MLV RT. Ratios of hexamer to mRNA of 10:1 for M-MLV H- RT and 40:1 for M-MLV RT were reported to produce reasonable yields of cDNA without sacrificing length. This indicates that primer concentration must be optimized for different amounts of starting RNA template to achieve efficient cDNA synthesis efficiency. Since random primer has the potential to omit sequence close to the mRNA polyA tail, in some protocols, oligo dT primer and random primers have been used as mixtures and combine both priming methods.
The choice and concentration of primer can have a profound impact on the quantitative representation of different mRNA transcripts in first-strand cDNA. It is apparent therefore, that improved compositions and methods for improving the yield of cDNA produced using reverse transcription are greatly to be desired. It is also apparent that new methods for making collections or libraries of cDNA from cells or tissue that more accurately represent the relative amounts of mRNAs present in the cells or tissue are greatly to be desired. It is also apparent that more convenient compositions and kits for use in such methods are desirable.