The sets of all mRNA transcripts present in living cells, termed “transcriptomes,” are fundamental units for regulating life processes. The direct and comprehensive determination of their sequence content is essential for improving our understanding of proteome constitution and flexibility, thereby providing the knowledge and targets to intervene in such diverse processes as cancer, tissue specificity, (auto)immune responses, genetic diseases, and environmental adaptation, to name but a few.
However, transcriptome analysis has proved a more difficult experimental task than determination of whole genomes, because, unlike DNA, mRNA transcripts in cells are present in highly uneven abundance and are variable in a context and environmentally sensitive manner. RNA sequence information has been obtained conventionally by labor-intensive sequencing of expressed sequence tags (ESTs) and complementary DNA (cDNA) libraries, so few transcriptomes have been extensively characterized.
Sequencing of other RNAs likewise conventionally relies on their conversion to cDNAs followed by sequencing of the cDNAs. Inclusion of the conversion step is undesirable, since errors can be introduced during the process of reverse transcription and since information on base modifications and secondary structure important in RNA function is not preserved during conversion. Methods, systems, and reagents for convenient and accurate direct determination of RNA sequences, as well as RNA secondary structure and base modifications, are therefore desirable. The invention described herein fulfills these and other needs, as will be apparent upon review of the following.