Alternative splicing (AS)—the process by which different pairs of splice sites are selected in precursor mRNA to generate multiple mRNA and protein products—is responsible for greatly expanding the functional and regulatory capacity of metazoan genomes (Braunschweig et al., 2013; Chen and Manley, 2009; Kalsotra and Cooper, 2011). For example, transcripts from over 95% of human multi-exon genes undergo AS, and most of the resulting mRNA splice variants are variably expressed between different cell and tissue types (Pan et al., 2008; Wang et al., 2008). However, the function of the vast majority of AS events detected to date are not known, and new landscapes of AS regulation remain to be discovered and characterized (Braunschweig et al., 2014; Eom et al., 2013). Moreover, since the misregulation of AS frequently causes or contributes to human disease, there is a pressing need to systematically define the functions of splice variants in disease contexts.
AS generates transcriptomic complexity through differential selection of cassette alternative exons, alternative 5′ and 3′ splice sites, mutually exclusive exons, and alternative intron retention. These events are regulated by the interplay of cis-acting motifs and trans-acting factors that control the assembly of spliceosomes (Chen and Manley, 2009; Wahl et al., 2009). The assembly of spliceosomes at 5′ and 3′ splice sites is typically regulated by RNA binding proteins (RBPs) that recognize proximal cis-elements, referred to as exonic/intronic splicing enhancers and silencers (Chen and Manley, 2009). An important advance that is facilitating a more general understanding of the role of individual AS events is the observation that many cell/tissue type- and developmentally-regulated AS events are coordinately controlled by individual RBPs, and that these events are significantly enriched in genes that operate in common biological processes and pathways (Calarco et al., 2011; Irimia and Blencowe, 2012; Licatalosi and Darnell, 2010).
AS can have dramatic consequences on protein function, and/or affect the expression, localization and stability of spliced mRNAs (Irimia and Blencowe, 2012). While cell and tissue differentially-regulated AS events are significantly under-represented in functionally defined, folded domains in proteins, they are enriched in regions of protein disorder that typically are surface accessible and embed short linear interaction motifs (Buljan et al., 2012; Ellis et al., 2012; Romero et al., 2006). AS events located in these regions are predicted to participate in interactions with proteins and other ligands (Buljan et al., 2012; Weatheritt et al., 2012). Indeed, among a set of analyzed neural-specific exons enriched in disordered regions, approximately one third promoted or disrupted interactions with partner proteins (Ellis et al., 2012). These observations suggested that a widespread role for regulated exons is to specify cell and tissue type-specific protein interaction networks.
Human disease and disorder mutations often disrupt cis-elements that control splicing and result in aberrant AS patterns (Cartegni et al., 2002). Other disease changes affect the activity or expression of RBPs, causing entire programs of AS to be misregulated. For example, amyotrophic lateral sclerosis-causing mutations in the RBPs TLS/FUS and TDP43 affect AS and other aspects of post-transcriptional regulation (Polymenidou et al., 2012). It is also widely established that misregulation of AS plays important roles in altering the growth and invasiveness of various cancers (David and Manley, 2010). As is the case with assessing the normal functions of AS, it is generally not known which misregulated AS events cause or contribute to disease or disorder phenotypes.
Central to addressing the above questions is the importance of comprehensively defining AS programs associated with normal and disease biology. Gene prediction algorithms, high-throughput RNA sequencing (RNA-Seq) analysis methods, and RNA-Seq datasets generally lack the sensitivity and/or depth required to detect specific types of AS. In particular, microexons (Beachy et al., 1985; Coleman et al., 1987), defined here as 3-27 nucleotide (nt)-long exons, have been largely missed by genome annotations and transcriptome profiling studies (Volfovsky et al., 2003; Wu et al., 2013; Wu and Watanabe, 2005). This is especially true for microexons shorter than 15 nts. Furthermore, where alignment tools have been developed to capture microexons (Wu et al., 2013), they have not been applied to the analysis of different cell and tissue types, or disease states.