The sequences of mammalian exons perform at least two overlapping roles in gene expression. First, exons are encoded with the primary sequence determinants of proteins. This information is decoded by the ribosome and translated into functional polypeptides. Secondly, exonic sequences contribute to pre-mRNA splicing through both sequence and local structure. This is not surprising given the organization of mammalian genes, which typically contain small exons (˜140 bp) flanked by thousands of base-pairs of intronic DNA sequence. The large size of many mammalian genes, and the apparent degeneracy of mammalian splice sites marking the 5′ and 3′ termini of introns, are also suggestive of a requirement for auxiliary cis-acting elements in facilitating exon recognition.
Exonic sequences contain a staggering array of cis-acting elements that direct the activation or repression of splicing. These functional elements are classified as either exonic splicing enhancers (ESE) or silencers (ESS) based on their ability to stimulate or inhibit splicing, respectively. ESE and ESS elements, acting in concert with their cognate trans-acting RNA-binding proteins, represent important components in a splicing code that specifies how, where and when mRNAs are assembled from their precursors. Two of the major players in establishing exon identity are the serine and arginine-rich proteins (SR proteins) and the heterogeneous nuclear ribonucleoproteins (hnRNPs). SR proteins promote the initial stages of spliceosome assembly by binding to ESEs and recruiting basal splicing factors to adjacent splice sites or by antagonizing the effects of ESS elements. By contrast, hnRNPs mediate the repressive effects of silencers and can alter recruitment of the core splicing machinery. The interactions between silencers, enhancers and their cognate binding proteins play a critical role in the fidelity and regulation of pre-mRNA splicing.
At least 10% of all mutations identified as causing human inherited disease are known to alter consensus 5′ or 3′ splice sites, thereby inducing aberrant pre-mRNA splicing. Nonetheless, the roles played by pre-mRNA splicing in human genetic disease remain enigmatic. While the mechanistic consequences of mutations on splice sites are fairly easy to interpret, evaluating precisely how inherited disease-causing mutations influence the loss or gain of ESE/ESS motifs is more challenging. This is due in part to the considerable functional overlap between protein coding sequences and the cis-acting elements involved in splicing regulation. Hence, many missense and nonsense mutations that alter mRNA splicing may be incorrectly assumed to impact solely upon protein structure-function relationships as a consequence of amino acid substitution or protein truncation, rather than upon splicing changes per se. It is also possible that the impact of a disease allele may be due to the combination of an aberrant splicing event and the presence of a normal length mutation-bearing transcript. Such multifunctional sites within coding regions have been identified by the intragenic mapping of common single nucleotide polymorphisms (SNPs). As a consequence of purifying selection, SNPs appear somewhat depleted and synonymous codon bias restricted (GAA vs. GAG), revealing a silhouette of the “splicing code” that appears position-restricted relative to the edges of exons.