Mammalian genomes are more pervasively transcribed than previously expected (Bertone et al. (2004) Science 306:2242-2246; Carninci et al. (2005) Science 309:1559-1563; Calin et al. (2007) Cancer Cell 12: 215-229; and Carninci (2008) Nat. Cell Biol. 10:1023-1024). In addition to the protein-coding regions of genes, much of the genome is transcribed as non-coding RNAs (ncRNAs). These non-coding genomic transcripts include many different types of small regulatory ncRNAs and long ncRNAs (lncRNAs).
Included among the small non-coding RNAs are small interfering RNAs (siRNAs), microRNAs (miRNAs) and Piwi-associated RNAs (piRNAs), which function in genome defense and post-transcriptional regulation (Mattick et al. (2005) Hum. Mol. Genet. 14 Spec No 1, R121-R132; He et al. (2004) Nat. Rev. Genet. 5:522-531; and Hutvagner et al. (2008) Nat. Rev. Mol. Cell. Biol. 9:22-32). In addition, divergent transcription by RNA polymerase near transcriptional start sites (TSS) can result in generation of small ncRNAs, ranging from 20 to 200 nucleotides. These ncRNAs have been variously named promoter-associated small RNAs (PASRs), transcription-initiation RNAs (tiRNAs) and TSS-associated RNAs (TSSa-RNAs) (Kapranov et al. (2007) Science 316:1484-1488; Seila et al. (2008) Science 322:1849-1851; Taft et al. (2009) Nat. Genet. 41:572-578; and Core (2008) Science 322:1845-1848). It remains uncertain, however, if these ncRNAs are functional or just represent byproducts of RNA polymerase infidelity (Ponjavic et al. (2007) Genome Res. 17:556-565; Struhl (2007) Nat. Struct. Mol. Biol. 14:103-105).
Long ncRNAs vary in length from several hundred bases to tens of kilobases and may be located separate from protein coding genes (long intergenic ncRNAs or lincRNAs), or reside near or within protein coding genes (Guttman et al. (2009) Nature 458:223-227; Katayama et al. (2005) Science 309:1564-1566). Recent evidence indicates that active enhancer elements may also be transcribed as lncRNAs (Kim et al. (2010) Nature 465:182-187; De Santa et al. (2010) PLoS Biol. 8:e1000384).
Several lncRNAs have been implicated in transcriptional regulation. For example, in the CCND1 (encoding cyclin D1) promoter, an ncRNA transcribed 2 kb upstream of CCND1 is induced by ionizing radiation and regulates transcription of CCND1 in cis by forming a ribonucleoprotein repressor complex (Wang et al. (2008) Nature 454:126-130). This ncRNA binds to and allosterically activates the RNA-binding protein TLS (translated in liposarcoma), which inhibits histone acetyltransferases, resulting in repression of CCND1 transcription. Another example is the antisense ncRNA CDKN2B-AS1 (also known as p15AS or ANRIL), which overlaps the p15 coding sequence. Expression of CDKN2B-AS is increased in human leukemias and inversely correlated with p15 expression (Pasmant et al. (2007) Cancer Res. 67:3963-3969; Yu et al. (2008) Nature 451:202-206). CDKN2B-AS1 can transcriptionally silence p15 directly as well as through induction of heterochromatin formation. Many well-studied lncRNAs, such as those involved in dosage compensation and imprinting, regulate gene expression in cis (Lee (2009) Genes Dev. 23:1831-1842). Other lincRNAs, such as HOTAIR and linc-p21 regulate the activity of distantly located genes in trans (Rinn et al. (2007) Cell 129:1311-1323; Gupta et al. (2010) Nature 464:1071-1076; and Huarte et al. (2010) Cell 142:409-419).
A number of the identified lncRNAs are differentially expressed in association with cell proliferation, differentiation, or apoptosis and could have important roles in regulating cell function (Huarte et al. (2010) Cell 142(3):409-419; Loewer et al. (2010) Nat. Genet. 42(12):1113-1117; Ponjavic et al. (2009) PLoS Genet. 5(8):e1000617; Gupta et al. (2010) Nature 464(7291):1071-1076; and Mazar et al. (2010) Mol. Genet. Genomics 284:1-9). Such lncRNAs may potentially be useful diagnostically or therapeutically; however, the functions of only a few of these lncRNAs have been studied in detail, and many more functional lncRNAs have yet to be discovered. Thus, there remains a need in the art for identifying and characterizing lncRNAs that can be used in developing diagnostics and therapeutics.