A. Field of the Disclosure
The present disclosure generally relates to the field of molecular biology. In particular, the present disclosure pertains to generating a sequencing library of micro RNA (miRNA). More specifically, the present disclosure pertains to reducing the frequency of specific miRNAs in a sequencing library.
B. Background
Micro RNAs are naturally occurring, small non-coding RNAs that are about 17-25 nucleotide bases in length in their biologically active form. miRNAs post-transcriptionally regulate gene expression by repressing target mRNA translation and by targeting transcripts for destruction. It is thought that miRNAs function as negative regulators, such that greater amounts of a specific miRNA will correlate with lower levels of target gene expression.
Given their important role in gene regulation, and therefore human health, large scale sequencing of miRNA has become a very valuable scientific tool in the study of human disease. There are various methods known in the art of creating a miRNA library to be sequenced.
Small RNAs can be measured with a variety of technologies, including qPCR, microarrays and solution-based hybridization, amongst others. Next-generation DNA sequencing (NGS) is also a powerful method for the discovery and quantification of small RNAs due to its technical performance, low expense, ultra-high throughput and its ability to agnostically detect and measure new species.
For example, as generally shown in FIG. 1, in a protocol utilized by Illumina, Inc. and other commercial companies that make Illumina compatible kits, to generate a miRNA sequencing library, an adenylated DNA 3′ adapter 40 with a blocked 3′ end is ligated to an RNA molecule's 10 3′ end 20 using a truncated T4 RNA ligase 2 50. This truncated T4 RNA ligase 2 50 requires the 3′ adapter 40 substrate to be adenylated. The result is that fragments of other RNA species in the total RNA sample are not ligated together in this reaction; only the pre-adenylated oligonucleotide can be ligated to free 3′ RNA 20 ends resulting in a miRNA molecule with a 3′ adapter ligated thereto 60. Moreover, since the 3′ adapter 20 is 3′ blocked, it cannot serve as a substrate for self-ligation. In the next step, a 5′ adapter 70 is added along with RNA ligase 1 80. Only RNA molecules 10 whose 5′ ends 30 are phosphorylated will be effective substrates for the subsequent ligation reaction. After this second ligation, an miRNA with both 3′ and 5′ adapters ligated thereto 90 is formed. Next, reverse transcription polymerase chain reaction (RT-PCR) amplification 100 is performed. After RT-PCR amplification 100 the library may be sequenced and analyzed 110. This library preparation method results in an oriented library such that the sequencing always reads from the 5′ end 30 to the 3′ end 20 of the original RNA molecule 10.
However, NGS of small RNAs has several technical challenges. Among these is the well-reported biased behavior of the modified forms of T4 RNA Ligase 2 commonly used in sequencing library generation protocols. This bias manifests in small RNA libraries as differential ligation, creating an over-representation of certain species and an under-representation of others. When small RNA libraries are constructed from many sample types, these biases in ligation efficiency, combined with inherent abundance differences, can yield inaccurate results. Highly abundant small RNA species may be preferentially ligated such that their representation in the library becomes inordinately high, diminishing the ability to measure other less abundant species. The precise detection of these underrepresented species would thus require very high sequencing depths and proportionally higher costs. Additionally, highly abundant species interfere with many normalization techniques, limiting the utility of the collected reads.
In small RNA libraries made from human plasma and serum, many of the most highly abundant species are probably derived from blood cell populations. While these may be of interest in some applications, miRNAs and other small RNAs that act as biomarkers for many diseases, such as cancer and neurodegenerative disease, may be of low abundance in the blood of afflicted patients. Accordingly, the problem facing researchers interested in blood-based miRNA biomarkers is how to measure precisely low-abundance species in a background of highly abundant and less informative species that comprise most of the reads in sequencing library.
Accordingly, there is a need for an effective method for reducing the frequency of overrepresented or abundant miRNAs 10 in miRNA sequencing libraries.