Most populations of cells from higher eukaryotes are heterogeneous in ways that cannot be fully elucidated by bulk analysis. The causes of this heterogeneity include: differentiation in subtly different ways, varying stages of the cell cycle, cellular senescence, and non-uniform RNA processing and degradation. Such cellular heterogeneity could be studied by robust techniques for single cell transcriptome analysis, particularly if the techniques analyze full-length transcripts. Reliable methods for transcriptome analysis are also required for situations where only low quantities of (LQ) cells are available, and where the RNA may be partly degraded.
Advances in high throughput sequencing and innovations in biochemical techniques have revealed a complex picture of the mammalian transcriptome (Wang, et al., Nat Rev Genet. 10(0:57-63 (2009)). Most genes that contain three or more exons give rise to alternatively spliced products that may vary with the cell type or state of differentiation (Wang, et al., Nature 456(7220:470-476 (2008)), and these alternative splice forms often have different, even antagonistic functions. In an extreme case, the Drosophila Dscam gene has >30,000 alternative transcripts hypothesized to provide distinct identities to individual neuronal dendrites, and avoid self interaction between the processes of a single neuron (Hattori, et al, Nature 461(7264):644-648 (2009)). Thousands of long, polyadenylated, intergenic “non-coding” RNAs (LINCs) have been discovered (Guttman, et al., Nature 458(7235):223-227 (2009), Carninci, DNA Res. 17(2):51-59 (2010)) that may have diverse regulatory functions, including serving as scaffolds for proteins that interact with chromatin (Khalil, et al., Proc Natl Acad Sci USA 106(28):11667-11672 (2009)). A fraction of these LINC RNAs may be translated, and encode short peptides (Ingolia, et al., Science 324(5924):218-223 (2009)). Cytoplasmic recapping of RNAs has been demonstrated enzymatically (Schoenberg, et al., Trends Biochem Sci. 34(9):435-442 (2009), Otsuka, et al., Mol Cell Biol. 29(8):2155-2167 (2009)). A number of genes use multiple promoters, and the position of the 5′ transcription start sites of RNAs may shift under different physiologic conditions. Finally, the mRNA 5′ “untranslated (UTR)” regions are now known to be translated frequently (Brar, et al., Science 335(6068):552-557 (2012) Oyama, et al., Mol Cell Proteomics 6(6):1000-1006 (2007) Oyama, et al., Genome Res. 14(10B):2048-2052 (2004)), and may produce biologically active peptides. More than half of the translation initiation sites used by a cell are not predicted from annotated genes. These new sites include many that occur in the 5′ leader sequences of mRNAs, and may use near-canonical UUG, CUG, or GUG start codons. Hundreds of genes also show internal translation starts (Ingolia, et al., Cell. 147(4):789-802 (2011)). These could generate proteins with altered functions (Wethmar, et al., Bioessays. 32(10):885-893 (2010)). These complications, as well as issues such as RNA editing and allele specific levels of expression (Pastinen, Nat. Rev. 11(8):533-538 (2010)), all indicate the value of deep sequencing of full length transcripts.
Several approaches have been proposed for obtaining transcriptome data from single cells. A pioneer approach used reverse transcriptase and oligo-dT primers with a T7 phage RNA polymerase promoter sequence attached to the 5′ end of the oligo-dT run. The resulting cDNA was transcribed into multiple copies of RNA which were then converted back to cDNA (Phillips, et al., Methods 10(3):283-288 (1996)). This often truncates the cDNA molecule, losing 5′ sequences of the original mRNA, especially for relatively long transcripts, and requires multiple rounds of processing when starting with LQ cells, further exacerbating cDNA truncation. A recent modification (Hashimshony, et al., Cell Rep. 2(3):666-673 (2012)) enables multiplex analyses, but this is still 3′ end sequence biased. Other methods are based on PCR amplification of cDNA (Liu, et al., Methods Enzymol. 303:45-55 (1999), Ozsolak, et al., Genome Res. 20(4):519-525 (2010), Gonzalez, et al., PLoS ONE. 5(12):e14418 (2010), Kanamori, et al., Genome Res. 21(7):1150-1159 (2011), Islam, et al., Genome Res. 21(7):1160-1167 (2011), Tang, et al., Nat. Methods. 6(5):377-382 (2009), Kurimoto, et al., Nucleic Acids Res. 34(5):e42 (2006), Qiu S, et al., Front Genet. 3:124 (2012)).
However, these approaches may yield biased representations of sequences along the mRNA, and fail to give complete sequences for long mRNAs because long DNA templates are discriminated against even when a long PCR reaction is used. The Smart-Seq method (Ramsköld, et al., Nat Biotechnol. 30(8):777-782 (2012)) has been reported to use a long PCR method that provided sequences for a substantial portion of even very long cDNAs, although the distribution of sequences was uneven and the sequences of the 5′ regions of many mRNAs were depleted.
In view of short falls, there remains a need for improved ways of obtaining transcriptome data from single cells.
Therefore, it is an object of the invention to provide methods of amplifying cDNA from RNA isolated from low quantities of cells and single cells.
It is a further object of the invention to provide methods for full-length RNA (cDNA) sequencing for low quantities of cells and single cells.
It is another object of the invention to employ the methods of full-length RNA sequencing in diagnostic assays.
It is another object of the invention to employ the methods of in assays designed to test drug or other treatment efficacies.