Small RNAs (smRNAs) encompass several different classes of non-coding RNAs, including microRNAs (miRNAs), short interfering RNAs (siRNAs), small nucleolar RNAs (snoRNAs), and small nuclear RNAs (snRNAs). Among the endogenous smRNAs, miRNA is the most well studied with regard to both biogenesis and functional mechanism. miRNAs are short RNA molecules that act as post-transcriptional regulators by binding to mRNA and preventing it from being translated. The first miRNA, Lin-4, was identified in C. elegans in 1993 (Lee 1993; Wightman 1993). In 2000, a second miRNA, Let-7, was identified and found to be conserved across many species (Pasquinelli 2000; Reinhart 2000). In 2001, it was disclosed that miRNAs probably exist in all species (Lee 2001; Lau 2001; Lagos-Quintana 2001). Since 2001, miRNA research has extended to almost all corners of biological science, with in-depth investigations into miRNA biogenesis and biological functions and the use of miRNA as a therapeutic tool, diagnosis and prognosis marker, and treatment response predictor marker (Galasso 2010; Nagpal 2010; Kim 2009; Bartel 2009). This progression in miRNA study has coincided with the identification and profiling of novel smRNAs in many organisms.
smRNA profiling have traditionally relied on cloning and sequencing of individual RNAs using standard molecular methods. In the most common approach, adaptor oligonucleotides are joined to the 3′ and 5′ termini of smRNAs, and the ligation products are reverse transcribed and PCR amplified to generate a cDNA library. This procedure represents a significant technical challenge because it requires three gel purification steps. In addition, thousands of clones have to be individually sequenced to identify the smRNA population. This standard protocol is labor intensive, painstaking, and requires large amounts of starting materials, and therefore is not practical for many research or clinical settings (Pfeffer 2005). Although cost can be reduced several fold by concatenating fragments of smRNAs with both adaptors to sequence several clones together, expense is still a major obstacle in thoroughly surveying smRNA populations.
Next generation sequencing (NGS) technology was first applied to smRNA discovery with the use of massive parallel signature sequencing to survey the smRNA library of Arabidopsis thaliana (Lu 2005). Since then, many modified smRNA profiling procedures based on NGS have been developed and tested on various platforms (Hafner 2008; Lu 2007; Tang 2010). To date, human miRNAs alone represent 1,048 unique sequence entries in miRBase 16 (Griffiths-Jones 2010). NGS technology has also helped in the discovery of other smRNAs (Lu 2005).
Illumina's NGS technology (Solexa) has been rated one of the two leading protocols for RNA sequencing (Levin 2010). The Solexa platform has gained some advantage over other smRNA profiling protocols with their smRNA cloning protocol v1.5. This protocol, which is summarized in FIG. 1, drastically reduces the difficulties of smRNA cDNA library construction by eliminating the need for gel purification of smRNA/smRNA-adaptors (the most challenging and critical steps in smRNA discovery) and by requiring only about 1 μg of total RNA. However, it has been found that this protocol presents a major artifact in smRNA sequencing experiments. Therefore, there is a need in the art for improved methods of smRNA sequencing.