1. Field of the Invention
This invention relates to ribonucleoproteins and to proteins which contain amino acid sequences that recognize specific RNA sequences.
2. Discussion of the Background
General features of primary sequence that characterize RNA- and DNA-binding proteins only recently have begun to become apparent. The helix-turn-helix (Pabo et al, Annu. Rev. Biochem., (1984) 53: 293-321) and zinc-binding finger (Evans et al, Cell 1988) 52: 1-3) arrangements have both been observed as structural features of sequence-specific DNA-binding proteins. However, the helix-turn-helix arrangement found in prokaryotes has not revealed a primary amino acid sequence motif with a recognizable pattern (Matthews, Nature (1988) 335: 294-295).
In eukaryotes, the homeobox domain seems to represent a widespread primary sequence motif for specific DNA-binding (Levine et al, Cell (1988) 55: 537-540; Robertson, Nature (1988) 336: 522-524, and references therein), and the members of the steroid hormone receptor superfamily of DNA-binding proteins utilize a common motif which forms zinc-binding fingers (Evans, Science (1988) 240: 889-895).
RNA-binding proteins have been less well studied than DNA-binding proteins. General features of RNA-binding proteins were not evident until the recognition of an amino acid octamer present in four proteins associated with mammalian nuclear RNAs (Adam et al, Mol. Cell Biol. (1986) 6: 2932-2943). Despite the finding of the octamer in additional RNA-associated proteins (Swanson et al, Mol. Cell Biol. (1987) 7: 1731-1739; Dreyfuss et al, Trends Biochem. Sci. (1988) 13: 86-91), there has been no evidence to date that these sequences are involved in binding to specific RNA sequences.
The specific recognition of RNA by proteins involves a variety of amino acid sequences that differ widely among the known RNA-binding proteins (reviewed in Schimmel, Ann. Rev. Biochem. (1987) 56:125-128; Ollis et al, Chem. Rev. (1987) 87:981-985; see also Wilson et al, Proc. Nat. Acad. Sci. (USA) (1986) 83:7251-7255; Strub et al, Mol. Cell. Biol. (1990) 10:777-784). One family of proteins involved in RNA processing has been identified that share a primary sequence motif of approximately 80 amino acids which the inventors have termed an RNA recognition motif (RRM) (for reviews see Mattaj, Cell (1989) 57:1-3; Bandziulis et al, Genes Dev. (1989) 3:431-437; Keene et al, "Nuclear RNA binding proteins. In Progress in Nucleic Acid Research and Molecular Biology," K. Moldave and W. Cohn, eds. Academic Press, Inc., Orlando, Fla., 1990). This motif contains the strongly conserved RNP octamer consensus sequence (Adams et al, Mol. Cell Biol. (1986) 6:2932-2943) and is present as single or multiple copies in a given protein. Specific RNA-binding domains have been defined for the U1 snRNP-70K and A proteins and the domain corresponds closely to the RRM in each protein (Query et al, Cell (1989) 57:89-101; Scherly et al, Embo J. (1989) 8:4163-4170). The role of specific sequence elements within the RRM in determining the RNA recognition properties of this family of proteins has not been determined, however.
The U1 and U2 snRNPs are components of the spliceosome which removes introns from pre-mRNAs (reviewed in Sharp, Science (1987) 235:766-771; Steitz et al, "Functions of the abundant U-snRNPs. In Structure and Function of Major and Minor Small Nuclear Ribonucleoprotein Particles," M. L. Birnstiel, ed. (Pringer-Verlag: Berlin), pp. 115-154 (1988)). The U1 snRNP recognizes pre-mRNAS in part through base pairing of U1 RNA with the 5' splice site (Zhuang et al, Cell (1986) 46:827-835), while the U2 snRNP recognizes the intron branch point in part through base pairing with the conserved branch point in part through base pairing with the conserved branch point sequence (Parker et al., Cell (1987) 49:229-239; Wu et al, Genes Dev. (1989) 3:1553-1561; Zhuang et al, Genes Dev. (1989) 3:1545-1552). In addition to the U snRNP-common Sm proteins, the U1 snRNP contains 3 unique proteins (70K, A, and C), while the U2 snRNP contains only 2 unique proteins (B" and A'). The U1 snRNP-A (A) and U2 snRNP-A' (A') proteins are unrelated in sequence but are so named because they migrate at similar positions in SDS-polyacrylamide gels (Pettersson et al., J. Biol. Chem. (1984) 259:5907-5914; Bringmann et al., EMBO J. (1986) 5:3509-3516). For clarity the designation A.sup.prime for A' protein will be used in this text.
The A (Sillekens et al., EMBO J. (1987) 6:3841-3848) and U2 snRNP-B" (B") (Habets et al., Proc. Nat. Acad. Sci. (USA), (1987) 84:2421-2425) proteins each contain two RRMs. The sequences within the corresponding motifs of these proteins are highly conserved (Sillekens et al., EMBO J. (1987) 6:3841-3484). Their amino-terminal RRMs are 75% identical and their carboxy-terminal RRMs are 86% identical. Despite the high degree of primary amino acid sequence similarity, these two proteins associate with different RNAs in vivo (reviewed in Zieve, "Cell Biology of the snRNP Particles. Critical Reviews in Biochemistry and Molecular Biology" 25:1-46 (1990)).
The recognition of RNA by proteins has appeared to the inventors to be a key reaction in the regulation of expression of the genetic material of all cells. In this context, however, prior to the present invention it was not known how these proteins could recognize a specific sequence of RNA. If domains of these proteins were found they would have many important applications, e.g., in the fields of the regulation of gene expression, RNA-protein interactions, autoimmune and neoplastic diseases, and developmental biology.