Analysis of genome expression patterns provides valuable insight into the role of differential expression in a wide variety of biological processes, including but not limited to, various disease states. Such analysis, whether mRNA-based gene expression or small non-coding RNA-based expression analysis, is becoming a rapidly expanding avenue of investigation in many disciplines in the biological sciences. Small non-coding RNA discovery is also an area of great scientific and medical interest. It is believed that by knowing what parts of the genome are transcribed when and why, a better understanding of many complex and inter-related biological processes may be obtained.
Small non-coding RNAs are rapidly emerging as significant effectors of gene regulation in a multitude of organisms spanning the evolutionary spectrum. Animals, plants and fungi contain several distinct classes of small RNAs; including without limitation, miRNAs, siRNAs, piRNAs, and rasiRNAs. These small gene expression modulators typically fall within the size range of .about.18-40 nt in length, however their effect on cellular processes is profound. They have been shown to play critical roles in developmental timing and cell fate mechanisms, tumor progression, neurogenesis, transposon silencing, viral defense and many more. They function in gene regulation by binding to their targets and negatively effecting gene expression by a variety of mechanisms including heterochromatin modification, translational inhibition, mRNA decay and even nascent peptide turnover mechanisms. Therefore, identification of small RNAs in a given sample can greatly facilitate gene expression analysis.
Some small RNAs are produced from defined locations within the genome. MicroRNAs are such a class; they are typically transcribed by RNA polymerase II from polycistronic gene clusters or can also be generated from pre-mRNA introns. Thus far several thousand unique miRNA sequences are known. Other classes of small RNAs, such as piRNAs or endogenous siRNA, are not typically transcribed from a defined locus in the genome. Instead, they are generated in response to events such as viral infections or retrotransposon expression and serve to silence these ‘foreign’ sequences that would otherwise result in serious detriment to the cell. Descriptions of ncRNA can be found in, among other places, Eddy, Nat. Rev. Genet. 2:919-29, 2001; Mattick and Makunin, Human Mol. Genet. 15:R17-29, 2006; Hannon et al., Cold Springs Harbor Sympos. Quant. Biol. LXXI:551-64, 2006. Sequencing the entire population of small RNAs in a sample provides a direct method to identify and even profile all classes of these RNAs at one time.
Sequencing of nucleic acid molecules including small RNA molecules generally involves production of libraries of nucleic acid segments to which adapter sequences are added to each end. Sequencing of RNA typically involves a reverse transcription step to produce a cDNA molecule for sequencing. A major by-product when adding adapters to nucleic acids in preparing RNA sequencing libraries is an adapter:adapter product. This undesired by-product is generated when a 5′ adaptor is ligated directly to a 3′ adaptor with no RNA insert in between. These byproducts can occur when either single or double stranded adapters are used and when the 5′ and 3′ adapters are ligated simultaneously or sequentially. This by-product may be formed at a higher rate with library preparation methods which include denaturation directly after ligation and use an exogenous reverse transcriptase primer added prior to the reverse_transcription reaction. Thus, the cDNA that is generated from the reverse transcription reaction contains both the intended library inserts and the undesired by-product. In order to remove the undesired by-product a gel purification step may be needed to remove the adaptor:adaptor by-product prior to subsequent PCR amplification reactions. If the adaptor:adaptor by-product is not minimized or removed, much of the PCR amplification components may be preferentially utilized to amplify the by-product instead of the library inserts of interest. Thus sequencing capacity and reagents will be spent on sequencing this byproduct thereby limiting the yield of the RNA segment of interest. Purification of the ligation reaction to remove byproducts is a cumbersome step for customers and eliminates the possibility of using automated library preparation procedures.