As the number of predicted human genes has decreased, estimates of the extent of alternative pre-mRNA splicing have increased dramatically. Ninety-six percent of multi-exon human genes are thought to be alternatively spliced, generating a diversity of proteins far larger than the number of human genes. Pan et al., “Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing” Nature Genetics 40:1413-1415 (2008); and Wang et al., “Alternative isoform regulation in human tissue transcriptomes” Nature 456:470-476 (2008). Large-scale sequencing of fragmented mRNA (RNA-Seq) confirms this view: 114,742 different exon-exon junctions have been detected in human brain alone. However, RNA-Seq does not preserve the connectivity between exon-exon junction fragments, so that the potential influence of one splicing event on subsequent splicing events in the same transcript cannot be detected. Calarco et al., “Technologies for the global discovery and analysis of alternative splicing” Advances in Experimental Medicine and Biology 623:64-84 (2007). In the mouse central nervous system, the splicing of some pairs of exons appears to be coordinated. Fagnani et al., “Functional coordination of alternative splicing in the mammalian central nervous system” Genome Biology 8:R108-R108 (2007). Currently, the scope of such interdependence between distant splicing events is unknown.
Complex organisms increase the effective diversity and coding potential of their genomes through alternative splicing (AS). With the advent of newly developed high-throughput sequencing (HTS) techniques, it is estimated that 86% of multi-exon human genes undergo AS. A unique product of an AS event is called an isoform. The sheer number of isoforms detected by these studies, often expressed in a tissue-specific manner, suggests that AS may have biological significance.
While it has been estimated that approximately 25% of human genes contain multiple regions of AS, the coordination of different regions in the same mRNA molecule has been suggested for less than 40 genes and confirmed in even fewer. Types of AS include, for example, alternative transcriptional start sites, polyadenylation sites and/or first & last exons. Undoubtedly, inherent restrictions of methods used for the large-scale study of isoforms contribute to difficulties in identifying and studying distal coordinated AS events.
Most methods used for the large-scale study of isoforms involve, at some point, microarrays and/or sequencing. One common limitation is the piecemeal examination of a potentially long molecule. Isoforms can be many tens of thousands of nucleotides (nt) long, yet microarrays and sequencing can only analyze between 25 and 1000 nt of that sequence at one time. This limitation forces the reconstruction of the original sequence, during which the connectivity of sequence for a given molecule is lost, severely limiting the determination of splicing regulation that may occur over a distance. While it is possible to investigate coordinated AS in a single gene through traditional cloning and RT-PCR analysis, using these approaches in a large-scale study is very labor-intensive.
However, a high-throughput, single-molecule technique, designed to directly assay distal regions of AS, may provide evidence for a general phenomenon of coordinated, intramolecular, splicing choices. It is clear that a more informative method to assess alternative splicing across the genome is needed. For example, a method that establishes exon sequence connectivity for each mRNA isoform in a cell, retains abundance information, and uses existing HTS technology would be advantageous to the molecular biology research community.