The past few years have seen a significant increase in our understanding of the complexity of mammalian transcription and many novel ribonucleic acid (RNA) transcripts have been detected. This has partly come as a surprise since the total number of conventional (protein coding) genes in the human genome (around 20,000-25,000) is much lower than anticipated a few years ago, and of the same magnitude as the number of genes in simpler organisms such as Drosophila melanogaster or Caenorhabditis elegans (Finishing the euchromatic sequence of the human genome. Nature 431 (7011):931-945 (2004)).
Two major transcriptomics efforts have in a complementary manner led the way in establishing an emerging modified view of mammalian transcription. First, the international FANTOM (Functional Annotation of the Transcriptome of Mammals) has for several years produced and analyzed massive amounts of complementary DNA (cDNA) sequencing data primarily from mouse but also from human cells and tissues (Carninci, P. et al. (2005) Science 309 (5740), 1559-1563; Katayama, S. et al. (2005) Science 309 (5740), 1564-1566). Second, independently, high density (“tiling”) microarray experiments have provided complementary evidence that transcription occurs extensively throughout the human genome and that there exist many unannotated transcripts of unknown function (Cheng, J. et al. (2005). Science 308 (5725), 1149-1154; Kapranov, P. et al. (2005) Genome Res 15 (7), 987-997).
RNAs can be classified into (1) messenger RNAs (mRNAs), which are translated into proteins, and (2) non-protein-coding RNAs (ncRNAs). Until recently, it was thought that there are only small numbers of ncRNAs (e.g., tRNAs, rRNAs and spliceosomal RNAs) which all would relate to protein synthesis or function. Moreover, until a few years ago there were no systematic efforts to identify novel ncRNA transcripts and elucidate their functions.