There are various kinds of methods to determine a comprehensive expression profile of a cell or cell population or the presence of nucleic acid molecules in a sample, one of which is the established method of sequencing. Sequencing of nucleic acid molecules became a very important analytic technique in modern molecular biology in the recent years. The development of reliable methods for DNA sequencing has been crucial for understanding the function and control of genes and for applying many of the basic techniques of molecular biology. These methods have also become increasingly important as tools in genomic analysis and many non-research applications, such as genetic identification, forensic analysis, genetic counseling, medical diagnostics and many others.
The determination of the RNA content of a cell or a tissue via sequencing provides a method for functional analysis. In existing methodologies prior to the sequencing the expressed and isolated mRNA is reverse transcribed in vitro into cDNA molecules followed by random shearing into cDNA fragments. Those fragments are tagged with linker sequences that are used to specifically amplify these fragments via a PCR step. The library of PCR amplicons obtained as such can be sequenced via various sequencing processes, e.g. deep sequencing or next generation deep sequencing methods (See e.g. Ronaghi, M. (2001), Genome Research 11:3-11; Rothberg J M, et al. Nature 475(7356):348-52; Mardis E R, Trends Genet. (2008), Vol. 24(3):133-41; Liu L et al., J Biomed Biotechnol (2012): 251364; Henson J. et al., Pharmacogenomics (2012), 13(8):901-15; Ruan X et al., Methods Mol Biol. (2012), 809:535-62. Fullwood M J et al., Genome Res. (2009), 19(4):521-32).
A drawback of these PCR-step-based methods is the unreliable quantification of rare DNA and mRNA/cDNA molecules, respectively. This is due to the fact that PCR amplification can introduce unevenness in coverage of individual sequences. The addition of molecular random identifiers to the generated DNA fragments at the stage of DNA synthesis or reverse transcription in case of mRNA as starting material has been shown to allow eliminating the uneven coverage bioinformatically by counting individual DNA molecules only once. However, this method relies on a high sequencing coverage, which is a cost- and time-consuming procedure.
Generally, after the DNA polymerase reaction unincorporated primers have to be removed from the sample before sequencing, since otherwise the primers will dominate the sequencing reads and thereby reduce effective sequencing coverage of cDNA molecules. Typically, this is achieved by size dependent separation of the molecules, e.g. via polyacrylamide gel electrophoresis (PAGE), which suffers from poor quantitative yield and poor discrimination between molecules of similar sizes.
Hence, there is need for methods that allow improved gene expression analysis and overcome at least some of the above-mentioned drawbacks of existing technologies.