1. Field of the Invention
The invention generally relates to molecular biology, specifically to methods and compositions for maximizing capture of affinity-labeled RNA complexes onto magnetic beads coupled to affinity capture moieties. The invention is especially useful in the context of removing ribosomal RNA and other unwanted RNAs from RNA samples prior to analyzing the samples, and especially for RNA analysis using massively parallel sequencing.
2. Description of the Relevant Art
“Sequencing” is the term used to describe the process of determining the order of nucleotides in polynucleotide molecules, typically genomic DNA and RNA. The technology for sequencing has evolved over the several decades since it was first invented. Initially, sequencing required clonal amplification of individual target molecules in plasmid or phage vectors, and the resulting templates were then sequenced in individual reactions and analyzed in separate lanes of high resolution polyacrylamide gels or, after the invention of automated sequencing, in separate channels or capillaries. More recently, newer sequencing technologies have been invented, that rely on simultaneous amplification of complex populations of DNA or RNA targets using the polymerase chain reaction (PCR). The complex populations may comprise fragments of DNA derived from whole genomes of cells or tissues, or the entire populations of RNAs (“transcriptomes”) present in cells or tissues. The amplified populations are then sequenced in parallel, enabling much higher throughput in acquisition of sequencing data, and at a much reduced cost. The newer methods are often referred to as “massively parallel sequencing” or “next generation sequencing” (NGS).
The amplified populations of complex DNA or RNA molecules are often referred to as “libraries”, and are produced by using the primary genetic material (as may be obtained for example by extraction of DNA or RNA from malignant tumor cells or from healthy normal cells) as input for a series of enzymatic modifications catalyzed by enzymes commonly used for molecular biology applications. Examples of such enzymes are RNA and DNA polymerases, RNA and DNA ligases, reverse transcriptase, thermostable DNA polymerase, etc. The enzymatic steps serve to introduce specific synthetic oliogonucleotide sequences into the primary target material, said sequences being necessary for exponentially increasing the number of target molecules by PCR (known as “amplifying the library”) to levels required for sequencing, and for adding sequences required for associating the library with the NGS instrument. The new sequencing technologies have enabled unprecedented ability to acquire genomic data, for example to determine sequences of entire genomes, and to determine the entire RNA output (known as “transcriptome profiling” or “global expression profiling”) of particular cells and tissues. RNA output can refer to traditional mRNAs that reflect protein-coding sequences, or non-coding RNAs including microRNAs and other small RNAs, as well as long non-coding RNAs.
A challenge for NGS is that target molecules to be analyzed are not clonally amplified, and the complex populations that comprise the sample to be analyzed contain a preponderance of contaminating sequences that are not of interest. For example, RNA samples are highly contaminated with ribosomal RNA (rRNA), which comprises ˜85% of the mass of total RNA extracted from biological samples such as human tissues and cell lines. It is desirable to remove rRNA prior to using the RNA sample for NGS, to avoid the cost in materials and data analysis associated with sequencing rRNA, and also to allow more sensitive detection of sequences of interest.
Several methods exist for removal or depletion of rRNA from total RNA samples. One method is to positively select polyadenylated RNAs using oligo-dT affinity reagents, thereby eliminating the rRNA from the recovered material. The polythymidine tracts in the oligo dT affinity reagents hybridize to the polyA regions that occur at the 3′ ends of most messenger RNAs (mRNAs), allowing these mRNAs of interest to be physically removed from the total RNA population. Disadvantages of poly A selection is that in some RNAs of interest, such as bacteria mRNAs, long noncoding RNAs and some eukaryotic mRNAs, are not polyadenylated and are therefore lost during the selection. Additionally, differences in length of the polyA tract between different mRNAS may lead to inefficient capture of mRNAs with short poly A regions, resulting in bias in the population of selected mRNAs. Another major drawback of poly A selection is that when applied to fragmented RNA, only the fragments from the 3′ ends of the mRNAs, containing the polyA capture regions, will be recovered. The RNA extracted from formalin fixed paraffin embedded (FFPE) tissues is recovered in highly fragmented form, and so poly A selection results in loss of sequencing information from all except the 3′ regions of mRNA. Information from most regions of the target genes, for example the presence of mutations and splicing isoforms, is lost when using poly A selection to recover rRNA-depleted mRNA from FFPE samples. This is a significant shortcoming, since there is high interest in using archived FFPE tissues from pathological samples such as tumors, to carry out RNA analysis using NGS. The goal is to discover biomarkers based on differences in mRNA or non-coding RNA patterns that are associated with cancer and other pathologies.
Other methods used to eliminate rRNA include subtractive hybridization with DNA probes, where capture DNA oligonucleotides complementary to short regions (less than ˜100 nucleotides) of rRNA are hybridized to the sample, followed by removal of the capture oligo/rRNA complexes. The capture oligos may be synthesized with a biotin modification at one end, to allow removal of the complexes by streptavidin affinity reagents such as streptavidin magnetic beads. Biotin-streptavidin approaches for affinity purification are widely used in various molecular and cell biology applications including purification of proteins, antibodies, and nucleic acids. Several commercial products are available for subtractive hybridization-based removal of rRNA, which use proprietary methods that are likely based on using rRNA-complementary capture oligonucleotides comprising DNA. An example of such a kit is the RiboMinus kit is sold by Life Technologies.
A related method, which is the subject of U.S. Patent Application No. 2011/0040081, uses rRNA-complementary affinity-labeled subtractive hybridization RNA probes for rRNA depletion. In a preferred embodiment, the probes are labeled with biotin and the rRNA/probe complexes are removed by capture onto a streptaviden solid support, such as magnetic beads. The technology described in the application may be that which is used in the commercially-available RiboZero kit from Epicenter.
Commercially available kits for rRNA depletion are expensive and the rRNA depletion has been reported to be unreliable, in the case of the RiboMinus kit, for thoroughly depleting rRNA (for example, see recent posts related to rRNA depletion in SEQAnswers, an on-line discussion forum and information source for next generation sequencing, at seqanswers.com). Methods based on use of short rRNA-complementary DNA oligos are expected to be inefficient for removing fragmented rRNA such as that recovered from FFPE samples, since not all of the fragments will include the sequences complementary to the capture oligonucleotides.
Another approach for depletion of rRNA uses double-strand-specific nucleases (enzymes that digest DNA and/or RNA) to digest rRNA that is hybridized to complementary DNA strands. Examples of nucleases used for this application include RNase H, which degrades the RNA strand in RNA/DNA heteroduplexes, and DSN (double strand nuclease), which can digest RNA/DNA and DNA/DNA duplexes. Disadvantages of nuclease methods include high cost, complicated protocols that require fragmentation of the RNA and conversion to cDNA prior to nuclease treatment, and the possibility that the nucleases will digest target RNAs of interest.