Messenger RNA (mRNA) has been the focus of RNA research since its discovery in the late 1950s. mRNA encodes proteins that are responsible for cell organization, structure, and function. mRNA constitutes from 1% to 5% of the total RNA within a cell. It was generally assumed that only the small percentage of human genomic deoxyribonucleic acid (DNA) that was transcribed into mRNA, ribosomal RNA, transfer RNA, and small nucleolar RNA was biologically significant; the vast, nontranscribed portion was considered “junk” DNA. Thus, the “central dogma” of molecular biology dictated that DNA was transcribed into RNA, which was, in the case of mRNA, in turn translated into protein.
Contrary to earlier assumptions underlying this central dogma of molecular biology, it is now known that most of the eukaryotic genome is transcribed, resulting in many RNAs that do not encode proteins. While the majority of this noncoding RNA (ncRNA) remains uncharacterized, it is becoming evident that most ncRNAs have significant biological function. Thus, RNA research has seen a major shift in focus in the past few years, from mRNA to ncRNA.
ncRNAs can be divided into two groups: long ncRNAs (greater than 200 nucleotides [nt]), and small ncRNAs (200 nt or smaller). Long ncRNAs are perhaps the least-understood type of ncRNA, and there is great interest in studying their roles in transcriptional regulation, epigenetic gene regulation, and disease pathways (Ponting, 2009). Small ncRNAs have already been the subject of intensive study. A subclass of small ncRNAs comprises RNAs that are 20-30 nt in length, and that have distinct regulatory functions; these include microRNA (miRNA), endogeneous small interfering RNA (siRNA), and piwi-interacting RNAs (piRNAs). These three types of small ncRNAs associate with distinct sets of effector proteins in order to exert their regulatory functions (reviewed in Carthew and Sontheimer, 2009).
The study of both coding and noncoding RNA has benefited greatly from analytical technologies such as quantitative reverse-transcription polymerase chain reaction (qRT-PCR), microarrays, and next-generation sequencing (RNA-Seq). These technologies typically require that the user: i) isolate the particular class of RNA molecules of interest from a total RNA preparation or directly from a biological sample; ii) take advantage of endogenous sequence “tags” when present, or introduce such tags when absent, at the 5′ end, 3′ end, or both; and iii) prepare complementary DNA (cDNA) from the tagged RNA.
The nature of the 5′ ends of different classes of RNA molecules plays an important role in their biological structure and function. The chemical moieties on the 5′ ends of RNA molecules influence their structure, stability, biochemical processing, transport, biological function and fate in a cell or organism. The chemical moieties commonly found at the 5′ ends of different RNA classes include triphosphates, monophosphates, hydroxyls, and cap nucleotides. The particular chemical moiety on the 5′ end provides important clues to the origin, processing, maturation, and stability of the RNA. Tagging the 5′ end of RNA can be accomplished by selecting the RNA based on the type of 5′ end, modifying the 5′ end if necessary, and then ligating the desired tagging sequence contained in a DNA or RNA oligonucleotide (for example, see World Patent Application WO 2009/026148 A1; Maruyama, 1994).
The 3′ ends of RNAs also undergo processing related to their origin, structure, and functional role within the cell. Almost all eukaryotic mRNAs have a polymeric stretch of adenine nucleotides at the 3′ end, i.e., a “poly(A) tail”; this tail confers properties such as mRNA stability, enhances translation, and contributes to transport of the processed mRNA from the nucleus to the cytoplasm. In prokaryotes, a minority of mRNA transcripts are polyadenylated; the poly(A) tail is thought to be largely involved in degradation of these transcripts (Steege, 2000).
The presence of a poly(A) tail in a class of desired mRNAs facilitates their study by 3′-end-tagging, in which a tagging oligonucleotide containing a poly(dT) sequence is annealed to the poly(A) tail of the RNA. Using a reverse transcriptase enzyme, and the tagging oligonucleotide as a primer, first-strand complementary DNA (cDNA) can be synthesized. By including additional functional sequences at the 5′ end of the tagging oligonucleotide, further analysis by a variety of methods is enabled, including expression microarrays, qRT-PCR, and RNA-Seq.
However, many types of RNA (including ncRNA) lack a poly(A) tail at the 3′ end, making it difficult to characterize them using poly(dT)-containing tags. Therefore, methods have been developed in the art to introduce a 3′-poly(A) tail for those RNA molecules that lack such an endogenous sequence. Originally, crude nuclear extracts were used for in vitro polyadenylation of RNA, albeit with relatively low efficiency (Cooper and Marzluff, 1977). Subsequently, cloned and purified enzymes were used for in vitro polyadenylation. For example, poly(A) polymerase from yeast has been used to specifically polyadenylate E. coli polysomal mRNA (Amara and Vijaya, 1997). However, this method has some deficiencies. For example, using E. coli poly(A) polymerase, different RNAs may be polyadenylated to different extents, depending on the structure of their 5′ and 3′ ends (Feng and Cohen, 2000). Using standard conditions known in the art, the tailing reaction is not quantitative; thus, some RNA in a population would be lost to further analysis (U.S. Pat. No. 7,361,465 B2). Further, with small amounts of RNA (<100 ng), the added poly(A) tail is extremely long (up to 9 kb) (U.S. Pat. No. 7,361,465 B2).
U.S. Pat. No. 7,361,465 B2 provides methods and compositions for tailing nonpolyadenylated RNA molecules, including miRNA, siRNA, tRNA, rRNA, synthetic RNA, or nonpolyadenylated mRNA. The inventors describe improvements in both efficiency of polyadenylation, and in reducing the length of the poly(A) tail. The optimized reaction conditions described by the inventors limit the length of the poly(A) tail to no more than 500 nt.
However, homopolymeric tails can be problematic for applications such as next-generation sequencing, when the poly(A) tail sequence introduces additional nucleotides between the sequencing primer and the sequence of the desired RNA. It is also desirable to precisely control the length of the poly(A) tail so that the same number of A nucleotides is added each time; i.e., the reaction is reproducible.
Further, it is known that poly(A) polymerase is inefficient in polyadenylating RNA molecules that have a 2′-O-methyl group on the 3′-terminal ribose (Ebhardt, 2005; Yang, 2005). Such modified RNAs include plant miRNAs and siRNAs (Yu, 2005; Yang, 2005) and some higher eukaryotic piRNAs (Saito, 2007). To address these shortcomings of polyadenylation, methods have been developed to tag the 3′ end of RNA molecules using modified oligonucleotides, taking advantage of the properties of different types of RNA ligase enzymes.
RNA Ligase 1 from bacteriophage T4-infected E. coli (T4 RNA Ligase 1) catalyzes the adenosine triphosphate (ATP)-dependent formation of a 3′ to 5′ phosphodiester bond between an RNA molecule with a 3′-hydroxyl group (the acceptor molecule) and another molecule bearing a 5′-phosphoryl group (the donor molecule). The reaction occurs in three steps, involving covalent intermediates (Silverman, 2004; England 1977):
i) T4 RNA Ligase 1 reacts with ATP to form a covalent enzyme-ATP intermediate (“adenylated enzyme”), with the release of pyrophosphate.
ii) The adenyl group is transferred from the adenylated enzyme to the 5′-phosphoryl end of a RNA molecule, to form a 5′,5′-phosphoanhydride bond (5′-App-RNA) with the elimination of adenosine monophosphate (AMP).
iii) The 5′-App-RNA donor reacts with the 3′-hydroxyl group of another acceptor RNA molecule, in the absence of ATP, to form a standard 3′ to 5′ phosphodiester bond between the acceptor and donor RNA molecules.
A 5′-adenylated DNA molecule may also be used as the donor molecule in step (iii) above. This approach has proven useful in 3′-ligation-tagging of RNA molecules. In one example of this method, a 5′-adenylated donor oligonucleotide (5′-App-DNA) is ligated to the 3′ end of a miRNA acceptor using T4 RNA Ligase 1 in the absence of ATP (Ebhardt, 2005). In another example of this method, the 5′-adenylated donor oligonucleotide additionally contains a blocking group at its 3′ end (5′-App-DNA-X), thereby preventing self-ligation of the donor oligonucleotide (Hafner, 2008); the reaction is catalyzed by T4 RNA Ligase 2.
Such 5′-adenylated, 3′-blocked oligonucleotides are available commercially (US Patent Application 2009/0011422 A1; Vigneault, 2008). However, commercial synthesis of such 5′- and 3′-modified oligonucleotides can be inefficient and expensive compared to standard (unmodified) oligonucleotides, especially when many oligonucleotides are required for applications such as preparing barcoded libraries for RNA-Seq (Vigneault, 2008).
Enzymatic methods are known in the art for 5′-adenylating oligonucleotides (e.g., see Vigneault, 2008). Since these methods still rely on the chemical addition of a 3′-blocking group to the DNA oligonucleotide during its synthesis, they also incur increased costs compared to synthesizing an unmodified oligonucleotide. In addition, the methods require gel-purification of the adenylated oligonucleotide, which is time-consuming and tedious, especially when a large number of adenylated oligonucleotides are required to be prepared, or when the reaction needs to be scaled up or performed in a high-throughput format (i.e., simultaneously with a large number of samples). Needed in the art is a simple, single-tube enzymatic method for 5′ adenylating and 3′ blocking of standard oligonucleotides. Such a method is desirable because it can be easily incorporated into a high-throughput workflow, for example, 3′ tagging of RNA with barcoded oligonucleotides for RNA-Seq.
In methods for 3′-end-tagging of RNA using 5′-adenylated DNA oligonucleotides, a large molar excess of the donor oligonucleotide is often used to increase the efficiency of the ligation. After termination of the ligation reaction, this excess of donor oligonucleotide must be removed before proceeding with subsequent enzymatic manipulation of the 3′-end-tagged RNA. The excess adaptor is problematic because, if not removed, it can lead to the formation of adaptor-dimer products that contribute to high background during sequencing. The methods known in the art use gel purification to remove excess donor oligonucleotide. This technique involves separation of the reaction products by polyacrylamide gel electrophoresis, excision of a gel slice containing the desired product, elution of the desired product from the gel slice by mechanical agitation in a suitable buffered solution, and ethanol precipitation to concentrate and purify the desired product (e.g., see Vigneault, 2008; Hafner, 2008). It is sometimes possible to skip the gel purification before proceeding to 5′-adaptor ligation; however, gel purification is still required before proceeding to applications such as cloning or next-generation sequencing (Vigneault, 2008). A modified purification method has been described (US Patent Application 2009/0011422 A1) that also uses gel purification.
Recently, a method has been described to remove adaptor-dimer products by using a locked nucleic acid (LNA) that is complementary to the adaptor-dimer products (Kawano, 2010, BioTechniques 49:751-755). This LNA binds to, and prevents reverse transcription from, adaptor dimers. Thus, during sequencing of small-RNA libraries prepared by this method, background due to non-insert sequence reads is reduced. However, this method does not attempt to reduce the formation of adaptor dimers, which is due to the presence of excess donor oligonucleotide.
An alternative method for preparing dual-tagged RNA libraries has been described, using sequential addition of adaptors to RNA, followed by reverse transcription and PCR, to generate a double-stranded (ds) cDNA library for sequencing (World Patent Application WO 2009/091719 A1). Although some embodiments of the methods in World Patent Application WO 2009/091719 A1 do not specifically mention a requirement for gel purification, the exemplary embodiments show that there is a substantial amount of undesired byproducts generated in the reaction. These byproducts need to be removed by either gel purification or high-performance liquid chromatography before sequencing the cDNA library (for example, see FIG. 3 in World Patent Application WO 2009/091719 A1).
Gel purification is a time-consuming process, and the efficiency of recovery can vary greatly (Sambrook, 1989). Further, the procedure subjects the RNA-DNA ligation product to increased risk of degradation. Thus, needed in the art are methods and kits for rapid and highly efficient removal of excess donor oligonucleotide from a ligation reaction, e.g., prior to subsequent enzymatic manipulation of the ligation reaction products, thereby preventing formation of adaptor dimers that can cause high background in RNA-Seq. Also needed in the art are methods and kits that provide rapid and efficient 3′ tagging of nonpolyadenylated RNA for applications such as cloning, microarray analysis and next-generation sequencing, without incurring potential loss of sample through multiple gel-purification steps.