Second generation sequencing (SGS) has revolutionized whole genome sequencing and transcriptome analysis of many organisms. In particular, sequencing of cDNA made from cellular RNA (RNA-Seq) enables RNA expression profiling with high dynamic range and genome coverage. RNA-Seq has expanded our knowledge of non-coding transcripts and led to discoveries of novel alternative splicing of RNA in various eukaryotic cell types. The primary component of eukaryotic total RNA is ribosomal RNA (rRNA), with all other coding, noncoding, and small RNAs representing less than 15% of the total RNA population. High abundance of rRNA-derived sequences in cDNA libraries diminishes the utility of SGS RNA-Seq for functional genomics studies, because only a small fraction of reads are from sequences of interest. In this context, cDNA library preparation techniques that efficiently remove highly abundant rRNA-derived sequence populations prior to sequencing are highly desirable.
A common method for excluding rRNA is to select for mRNAs that contain long polyadenylated tails, for example by using polythymidine primers in cDNA construction protocols. This approach, while highly effective in removing rRNA, also depletes all non-polyadenylated host transcripts, such as non-coding RNAs that regulate eukaryotic cellular function and viral and prokaryotic microbial sequences present in many complex sample types. Alternative ribosomal RNA removal protocols such as ribosomal oligonucleotide mediated capture techniques (e.g., Ribominus™ and RiboZero™) are species-specific and require extensive ribosomal sequence data for probe design. Both techniques are multi-step procedures, during which RNA can degrade, and both require large amounts of starting material (1-10 μg of total RNA), limiting the experimental design and sample-types for which they are practical.
Hydroxyapatite Chromatography-Based Normalization
An alternative to excluding rRNA sequences from cDNA preparation is to apply techniques, variously called cDNA normalization or Cot filtration, that remove highly abundant sequences from DNA libraries. FIG. 1 illustrates the basic process of DNA normalization. The process begins with a starting DNA population 10 (or “sample”). In normalization, double stranded DNA populations are first denatured at an elevated temperature 12, and then allowed to re-anneal (or “rehybridize”) 14, typically at a reduced temperature. Highly abundant sequences hybridize at higher rates (proportional to the square of their concentration) and, if the re-annealing reaction is stopped at a suitable time (e.g. 4-24 hours), the highly abundant sequences comprise the majority of double-stranded species. The next step in the process is to separate double-stranded DNA (dsDNA) 16 and single-stranded DNA (ssDNA) 18. If the two can be separated, the representation of the highest abundance species in the resulting ssDNA fraction can be significantly reduced.
HAC has not yet gained widespread usage in the normalization arena compared to the duplex-specific thermostable nuclease (DSN) approach for RNA-Seq applications, however, possibly as a result of perceived disadvantages. These disadvantages include labor intensiveness, unacceptably high starting material requirements, and poor reproducibility.
Capture Based Nucleic Acid Enrichment/Suppression
Another alternative approach for nucleic acid signal enrichment/suppression is based on the selective capture of unwanted sequences. In capture mode, the sample is also denatured and then allowed to anneal in the presence of excess biotinylated probes generated against the highly abundant species. The probes and any nucleic acids bound to them are then captured in a streptavidin chromatography column, and the resulting sample flowing through the column is then depleted with respect to the high abundance species. Commercially available kits enable capture-based enrichment of libraries for the human exome. However, these kits are not versatile; they have very limited utility outside of human SNP detection.
These two techniques—HAC and capture-based enrichment—have been shown to be highly effective, but both are generally very time and labor intensive, require unacceptably large amounts of starting material (sample), and have poor reproducibility. Therefore, it would be desirable to have new systems and methods for enriching nucleic acid samples. Ideally, such systems and methods would be less time and labor intensive, would work with smaller sample sizes than required by current methods, and would be highly reproducible.