1. Field of the Invention
This invention relates to proteins which contain amino acid sequences that bind to 3'-untranslated regions of mRNAs, particularly mRNA sequences containing "instability sequences" and sequences affecting the translation and localization of mRNA.
2. Discussion of the Background
General features of primary sequence that characterize RNA- and DNA-binding proteins have begun to become apparent. The helix-turn-helix (Pabo et al, Annu. Rev. Biochem., (1984) 53: 293-321) and zinc-binding finger (Evans et al, Cell 1988) 52: 1-3) arrangements have both been observed as structural features of sequence-specific DNA-binding proteins. In eukaryotes, the homeobox domain seems to represent a widespread primary sequence motif for specific DNA-binding (Levine et al, Cell (1988) 55: 537-540; Robertson, Nature (1988) 336: 522-524, and references therein), and the members of the steroid hormone receptor superfamily of DNA-binding proteins utilize a common motif which forms zinc-binding fingers (Evans, Science (1988) 240: 889-895).
Early on RNA-binding proteins were less well studied than DNA-binding proteins; general features of RNA-binding proteins were not evident until the recognition of an amino acid octamer present in four proteins associated with mammalian nuclear RNAs (Adam et al, Mol. Cell Biol. (1986) 6: 2932-2943). The recognition of RNA by proteins has appeared to the inventors to be a key reaction in the regulation of expression of the genetic material of all cells.
One of the present inventors has studied RNA binding proteins of this group for many years and in 1983 isolated the first eukaryotic recombinant cDNA member of this family of proteins that encodes the human La RNA binding protein (Chambers et al, Proc. Natl. Acad. Sci. (USA) (1985) 82: 2115-2119; Chambers et al, J. Biol. Chem. (1988) 263: 18043-18051).
Subsequently, the observation by Dreyfuss and coworkers (Adam et al, Mol. Cell. Biol. (1986) 6: 2932-2943; Swanson et al, Mol. Cell. Biol. (1987) 1: 1731-1739) of an "RNP consensus" octamer in several eukaryotic proteins associated with RNA was an early indication that an amino acid sequence common among some RNA-associated proteins might exist.
Other publications by the Dreyfuss group (Dreyfuss et al, TIBS (1988) 13: 86-91) and from many other laboratories (Amrein et al, Cell (1988) 55, 1025-1035; Bell et al, Cell (1988) 55, 1037-1046; Bugler et al, J. Biol. Chem. (1987) 262: 10922-1-925; Chambers et al (1988), ibid; Deutscher et al, Proc. Natl. Acad. Sci. (USA) (1988) 85: 9479-9483; Goralski et al, Cell (1989) 56, 1101-1108; Keene, J. D., J. Autoimmunity (1989) 2: 329-337; Merrill et al, J. Biol. Chem. (1988) 263, 3307-3313; Sachs et al, Mol. Cell. Biol. (1986) 7, 3268-3276) noted the presence of related sequences surrounding the octamer and speculated that these regions might participate in RNA binding. It was not known at that time however whether these sequences might endow specific as opposed to nonspecific recognition of RNA or if discontinuous regions involving long-range interactions within these proteins might be required for RNA binding.
Some authors speculated that the octamer alone (Dreyfuss et al, TIBS (1988) 13: 86-91) or the octamer and its surrounding residues constituted an RNA binding domain and Dreyfuss and coauthors (ibid) chose an arbitrary size of 100 amino acids. Their theory was based upon the occurrence of similar sequences in a set of proteins that were all thought to be associated with RNA. Evidence for direct binding of such regions to specific RNA sequences was not available and no domains of proteins with binding activity were defined experimentally.
Included in this theory was the speculation that the 70K U1 snRNP protein contained an RNA binding domain of 93 amino acids from positions 94 to 186. Other investigators (Theissen et al, EMBO J. (1986) 5: 3209-3217) had speculated that a different region of the 70K U1 snRNP protein encompassing amino acid residues 241 to 437 as well as the same region speculated by Dreyfuss were either one or both involved in RNA binding. These speculations were based upon the relationship of the highly basic (positively charged) region at amino acids 241 to 437 of 70K protein to regions of other proteins (e.g., protamines and histones) known to bind nucleic acid. No experimental evidence was available to support these suggestions.
Although the 70K protein is one of ten proteins known to be associated with the U1 snRNP complex (Pettersson et al, J. Biol. Chem. (1984) 259: 5907-5914), there was no evidence of specific RNA protein contact between the 70K protein and any RNA species until the discovery of a specific and direct binding of the 70K protein to U1 RNA. Furthermore, of the other members of this group of proteins studied in the inventors' laboratory, as well as, in many other laboratories, none was shown to directly bind to a specific RNA sequence until one of the present inventors discovered the sequence-specific interaction between 70K U1 snRNP protein and U1 RNA (C. Query et al, Cell 57:89 (1989)).
The region of the protein involved in this specific binding involves a different amino acid sequence of 70K protein than that speculated by Theissen et al or by Dreyfuss et al. In fact, one of the sequences proposed by Theissen as being responsible for RNA binding actually interferes with the detection of specific binding activity. (C. Query et al, Cell 57:89 (1989); Query et al, Mol. Cell Biol. 9: 4872 (1989)).
In addition, the discovery that the precise RNA binding domain of the 70K protein includes additional important amino acid sequences not previously recognized by the theory of Dreyfuss et al, by the published work of other workers mentioned above or by some of the inventors themselves in their earlier studies of La (Chambers et al, ibid) and the 60 kD Ro (Deutscher et al, ibid) protein members of the group.
RNA binding proteins are now known to be involved in the control of a variety of cellular regulatory and developmental processes, such as RNA processing and compartmentalization, mRNA translation and viral gene expression. Some proteins that recognize and bind RNA can be classified into families based upon primary sequence homology, as well as higher order structure.
The family of RNA binding proteins containing an RNP consensus octamer and an 80 amino acid motif implicated in RNA recognition (RRM) has been the subject of intense investigation. Query et al, Cell (1989) 57: 89-101; Kenan et al, Trends Biochem. Sci. (1991) 16: 214-220. Based upon crystallographic and NMR spectroscopic studies of the U1 RNA binding domain of the U1 snRNP-A protein a model of the tertiary structure has been derived. The tertiary structural model together with RNA binding studies have led to the suggestion that the RNA binding surface resides on a monomeric unit with four anti-parallel .beta.-strands which contains solvent exposed aromatic and basic residues. Kenan et al (1991) supra. Additional biochemical data have demonstrated that a determinant of RNA binding specificity resides in a loop which connects two .beta.-strands. Bentley et al, Mol. Cell. Biol. (1991) 11: 1829-1839.
More than forty members of the RRM superfamily have been reported to date, the majority of which reside in all tissues and are ubiquitously conserved in phylogeny. Kenan et al (1991) supra. Tissue-specific members of the RRM family are less common, including X16 which is expressed in pre-B cells, Bj6 which is a puff-specific Drosophila protein and elav (embryonic lethal abnormal vision) which is neuronal-specific in Drosophila. For some RRM proteins the natural RNA ligands have been identified or surmised, but the RNA-binding sequences are not known in most cases.
The RNA ligands for the tissue-specific RRM proteins have not been reported and may prove difficult to determine because of their specialized roles in certain developmental processes. However, in order to understand their functions in cellular RNA metabolism and development, it will be essential to identify the RNA sequences to which they bind.
Oncogenes encode growth factors that affect the rate of cell proliferation by influencing cell cycle events such as mitosis, intracellular signaling pathways and gene expression. Some well known oncogenes are c-src, c-myc and c-fos. Lymphokines, which affect the growth properties of immunoregulatory cells, also function as growth factors similar to oncogene products. Although oncogene products (oncoproteins) are central components in the origin of the neoplastic state, they work through a variety of complex and largely unknown pathways. Consequently, methods to specifically control the functions of oncoproteins are generally lacking.
The more recent discovery of suppressor oncogenes (anti-oncogenes) has held promise for being able to counter the effects of oncogenes. Some examples of anti-oncogenes include: retinoblastoma (Rb) and p53. It is hoped that these factors can be used to counter the effects of oncoproteins and thus, provide new treatments for cancer. For example, breast tumors show a consistent defect in the p53 gene, thus, preventing p53 from countering the oncogenes that cause uncontrolled proliferation of the breast tumors. Unfortunately, there are likely to be dozens of anti-oncogenes, some being specific to a given type of cancer and others functioning in combinations in various cancers.
Given the potential for cellular growth proteins to generate defects in cellular proliferation and differentiation, it is essential that multiple mechanisms exist to balance against their overproduction. The complex molecular circuitry involving receptor and nonreceptor-mediated tyrosine phosphorylation, as well as the action of GTPases and transcription factors requires multiple control points (J. M. Bishop, Cell 64: 235 (1991); L. C. Cantley et al, Cell 64: 281 (1991)). For example, the transcription of growth regulatory genes is tightly regulated at the level of the DNA promoter (B. Lewin, Cell, 64: 303 (1991)). Likewise, one would expect similar control mechanisms to exist in the cytoplasm to prevent inappropriate translation of growth factor mRNAs. Direct evidence for control of growth factor production in the cytoplasm is lacking. However, it is clearly documented that the oncoprotein messages themselves are tightly regulated (G. Shaw et al, Cell, 46: 659 (1986); D. Caput et al, Proc. Natl. Acad. Sci. USA 83: 1670 (1986); G. Brewer, Mol. Cell Biol. 5: 2460 (1991); T. R. Jones et al, Mol. Cell Biol. 7: 4513 (1987); A. B. Shyu et al, Genes & Devel. 3: 60 (1989); P. L. Bernstein et al, Genes & Devel. 6: 642 (1992); A. B. Shyu et al, Genes & Devel. 5, 221 (1991); S. Savant-Bhonsale et al, Genes & Devel. 6: 1927 (1992); D. W. Cleveland et al, The New Biologist 1: 121 (1989); J. Malter, Science 246: 664 (1989); E. Vakalopoulou et al, Mol. Cell Biol. 11: 3355 (1991); P. R. Bohjanen et al, Mol. Cell Biol. 11: 3288 (1991)). In the case of mRNAs encoding c-fos, c-myc and cytokines, the RNAs are unstable and short lived, and it has been reported that they undergo translational regulation (V. Kruys et al, Proc. Natl. Acad. Sci. USA 84: 6030 (1987); V. Kruys et al, Science 245: 852 (1989); R. Wisdom et al, Genes & Devel. 5: 232 (1991)). However, alterations in the mRNA stabilities of these oncoproteins have been reported to result in cellular transformation (F. Meijlink et al, Cell 36: 51 (1985); Ch. Dani et al, Proc. Natl. Acad. Sci. USA 81: 7046 (1984); G. D. Schuler et al, Cell 55: 1115 (1988)). Recent evidence from various cellular systems have implicated 3' untranslated regions (3' UTRS) of mRNAs in the regulation of growth and differentiation (F. Rastinejad et al, Cell 72: 903 (1993)).
Accordingly, there is a strongly felt need for the discovery of materials generally useful in the recognition, binding and/or expression of ribonucleic acids involved in growth, neoplasia and immunoregulation. Such materials would have many uses, including regulation of cell proliferation in vitro and in vivo, regulation of immune cell expression, stimulation of cell growth and tissue regeneration, the production of transgenic animals and cell lines for pharmaceutical tests of cancer, immune function and neurological diseases, diagnostic reagents for the detection of autoantibodies associated with cancers, in vivo targeting systems, in diagnosing pathology specimens of neuronal origin, and/or as genetic or neurogenetic disease markers involving malformations of the central nervous system.