The present invention relates to methods of identifying gene expression products which result from alternative splicing and more particularly to methods of diagnosing and treating disorders associated with the expression of such gene products, such as cancer.
Alternative splicing, the process by which multiple messenger RNA (mRNA) isoforms are generated from a single pre-mRNA species is an important means of regulating gene expression. Alternative splicing plays a central role in numerous biological processes such as sexual differentiation in Drosophila and apoptosis in mammals [Lopez (1998) Ann. Rev. Genet. 32:279-305]. Aberrant splicing generates abnormal mRNAs which are either unstable or code for defective or deleterious protein isoforms which are frequently implicated in the development of human disease [Lopez (1998) Ann. Rev. Genet. 32:279-305; Charlet (2002) Mol. Cell 10:45-53].
The significance of alternative splicing is further extended in the post genomic era. On the basis of the initial drafts of the human genome sequence it was estimated that 30,000-40,000 genes comprise the human genome [International Human Genome Sequencing Consortium. Nature 2001 409:860-921; Venter (2001) Science 291:1304-1351]. Although final gene counts may be higher, there is a disparity between the relatively small number of human genes and the complexity of the human proteome, suggesting that alternative splicing is important in the generation of protein diversity. The most striking example of alternative splicing complexity known, is the single pre-mRNA for a Drosophlia axon guidance receptor gene, Down Syndrome cell-adhesion molecule (Dscam), which can be processed to generate potentially 38,016 different mature transcripts [Schmucker (2000) Cell 101:671-684].
The accuracy and efficiency of the gene splicing reaction is attributed mainly to a number of cis sequence elements and trans-acting factors which are required for splicing.
Cis Sequence Elements
Any constitutive or alternative splicing event requires the assembly of the basal splicing machinery in spliceosome complexes on consensus sequences present at all boundaries between introns and exons, herein after the 5′ splice site (5′SS) and 3′ splice site (3′SS). The spliceosome has two functions; to recognize and select splice sites and to catalyze the two sequential transesterification reactions, which remove the introns and join the two exons together. The efficiency with which the splicosome acts on an exon is determined by a balance of several features, including the strength of a splice site essentially conformity to consensus splice site sequences, exon size and the presence of auxiliary cis elements. Exons of ideal size, (i.e., 50-300 nucleotides) with well conserved splice site sequences are recognized efficiently by the splicing machinery and are constitutively included in the transcript, whereas suboptimal exons require auxilliary elements for recognition. Typically, auxiliary elements which regulate the usage of alternative splice sites share several common features; they are small and variable in sequence and mostly present in multiple copies. Although most of these elements are single stranded, secondary structures have been implicated in the function of a few elements. Despite high level of conservation auxiliary cis elements are degenerate, rendering identification thereof difficult. Interestingly, auxiliary cis elements can be both exonic and intronic. Intronic cis elements can lie upstream, downstream or flanking both sides of the regulated exon and can be positioned proximally or distal to the regulated exon, however in most cases they are located close to the exon. Notably, such cis elements can enhance or repress splice site selection. Thus, depending on the location of the auxiliary cis elements and the effect thereof on the recognition of alternative splice sites, the elements are referred to as exonic splicing enhancers or silencers or intronic splicing enhancers or silencers. Ladd and Cooper list intronic splicing enhancers and silencers identified to date [(2002) Gen. Biol. 3(11):1-16]. Exonic splicing enhancers and silencers are described in Fairbrother (2002) Science 297:1007-1013
It is suggested that many alternative splice sites are associated with both enhancers and silencers and that regulation thereof is often the result of a dynamic antagonism between proteins binding such elements (i.e., trans-acting factors)
Trans-Acting Splicing Factors
The SR family of proteins—The SR proteins, a group of highly conserved proteins in metazones, are required for constitutive splicing and also affect alternative splicing regulation. They have a modular structure consisting of one or two copies of an RNA-recognition motif (RRM) and a C-terminal domain rich in alternating serine and arginine residues (the RS domain). The RRMs determine RNA binding specificity, whereas the RS domain mediates protein-protein interactions that are thought to be essential for the recruitment of the splicing apparatus and for splice-site pairing [Fu (1995) RNA 1:663-680; Graveley (2000) RNA 6:1197-1211; Tacke (1999) Curr. Opin. Cell. Biol. 11:358-362; Wu (1993) Cell 75:1061-1070].
Another class of RS domain containing proteins involved in splicing are the RS-related proteins (SRrps). These proteins which oftentimes contain RRMs, include the U1-70K protein, both subunits of U2AF, SRm 160/300 (two SR-related nuclear matrix proteins of 160 and 300 kDa), as well as alternative splicing regulators such as Tra and Tra2 [Pu (1995) RNA 1:663-680; Graveley (2000) RNA 6:1197-1211]. SR family and SR-related proteins function in the recognition of exonic splicing enhancers (ESEs) leading to the activation of suboptimal adjacent 3′ splice sites [Blenckowe (2000) Trends Biochem. Sci. 25:106-110].
Polypyrimidine tract binding proteins (PTB)—These RNA binding proteins, also termed hnRNPI, recognize the polypyrimidine tracts preceding 3′ splice sties and have a role as negative regulators of splicing. PTB repress several neuron specific exons in non-neuronal cells, as well as smooth muscle-specific inclusion of alternatively spliced exons in the a-tropomyosin and a-actinin pre-mRNAs. PTB and U2AF bind competitively to the polypyrimidine and this competitive binding has been proposed as the basis for the negative regulatory effects of PTB. However, more complex mechanisms of regulation by PTB are also likely to operate since binding of PTB to sites on both sides of the neuronal specific N1 exon in the mouse c-src pre-mRNA is required for repression [Valcarcel (1997) Curr. Biol. 7:R705-R708]
The CELF protein family—This family of proteins are involved in cell-specific and developmentally regulated alternative splicing [Ladd (2001) Mol. Cell. Biol. 21:1285-1296]. These RNA binding proteins contain three RRNs and divergent linker domain of unknown function. Several members of the family exhibit tissue specific expression and others are more broadly expressed. CELF proteins bind to muscle specific enhancers (MSE) in the cardiac Troponin-T gene (cTNT) and promote inclusion of the developmentally regulated exon 5.
Despite significant development of the splicing research, the ability to accurately predict splicing patterns is still difficult especially in light of the observation that functional splice sites do not always match the consensus sequences well, while many cryptic sites in the genome match the consensus but are not normally recognized by the splicing machinery. For example, a key step in pre-mRNA splicing involves the recognition and selection of a consensus sequence at the 5′ splice site (5′SS). Frequently, however, sequences which comply with the consensus are not selected for splicing [Green (1991) Annu. Rev. Biochem. 65:367-409]. These findings suggest that the sequence surrounding a splice site as well as the match thereof to the consensus, strongly affects the recognition of the splice site.
While reducing the present invention to practice the present inventors have uncovered that intronic in-frame stop codons located upstream to 5′SS sequence inactivate splicing from such splice sites. These findings enable to accurately and efficiently predict gene expression products in-silico.