Among the most significant aspects of mammalian cell physiology yet to be elucidated is the precise manner in which growth factors (e.g., hormones, neurotransmitters and various developmental and differentiation factors) operate to effect the regulation of cell growth. The interaction of certain growth factors with surface receptors of resting cells appears to rapidly induce a cascade of biochemical events thought to result in nuclear activation of specific growth-related genes, followed by ordered expression of other genes. Analysis of sequential activation and expression of genes during the transition from a resting state ("G.sub.0 ") to the initial growing state ("G.sub.1 ") has been the subject of substantial research. Lau et al. (1987); Sukhatme et al. (1987)!
Much of this research has involved analysis of the expression of known genes encoding suspected regulatory proteins (such as the protooncogenes, c-fos and c-myc) following mitogen stimulation. An alternative approach has involved attempts to identify genes activated by mitogenic stimuli through differential screening of cDNA libraries prepared from resting cells following exposure to serum and specific growth factors. Lau et al. (1985); Cochran et al. (1983).!
Of interest to the background of the invention is the continuously expanding body of knowledge regarding structural components involved in the binding of regulatory proteins to DNA. Illustratively, the so-called receptor proteins are believed to bind to DNA by means of zinc ion stabilized secondary structural fingers premised on the folding of continuous amino acid sequences showing high degrees of conservation of cysteines and histidines and hydrophobic residues. Gehring (1987).! For example, a "zinc finger" domain or motif, present in Xenopus transcription factor IIIA (TF IIIA), as well as the Drosophila Kruppel gene product and various yeast proteins, involves "repeats" of about 30 amino acid residues wherein pairs of cysteine and histidine residues are coordinated around a central zinc ion and are thought to form finger-like structures which make contact with DNA. The cysteine-histidine (or "CC--HH") zinc finger motif, as opposed to a cysteine-cysteine ("CC--CC") motif of steroid receptors, is reducible to a consensus sequence represented as Cys Xaa.sub.2-4 Cys Xaa.sub.3 Phe Xaa.sub.5 Lys Xaa.sub.2 His Xaa.sub.3 His (SEQ. ID. NO: 67) wherein C represents cysteine, H represents histidine, F represents phenylalanine, L represents leucine and X represents any amino acid. Klug et al (1987); Blumberg et al. (1987); and Schuh et al. (1986).!
Primary response or immediate early genes are those genes induced by mitogenic or other stimuli even in the absence of de novo protein synthesis and thus constitute the first step in the biochemical cascade resulting in gene activation or polypeptide expression. One such primary response gene, Egr-1, Sukhatme, et al. 1987; Sukhatme, et al. 1988! also known as NGFI-A Milbrandt 1988!, Krox24 Lemaire, et al. 1988!), zif268 Christy, et al. 1988!, and TIS8 Lim, et al. 1987!, is induced transiently and ubiquitously by mitogenic stimuli and also regulated in response to signals that initiate differentiation.
A transcription factor is a regulatory protein that binds to a specific DNA sequence (e.g., promoters and enhancers) and regulates transcription of an encoding DNA region. Typically, a transcription factor comprises a binding domain that binds to DNA (a DNA binding domain) and a regulatory domain that controls transcription. Where a regulatory domain activates transcription, that regulatory domain is designated an activation domain. Where that regulatory domain inhibits transcription, that regulatory domain is designated a repression domain.
Egr-1 encodes a nuclear phosphoprotein with three zinc finger motifs of the Cys.sub.2 His.sub.2 class, suggesting that Egr-1 may mediate growth response by regulating distal gene expression Cao, et al. 1990!. In this respect Egr-1 is like other immediate early transcription factors of the fos Greenberg, et al. 1984; Kruijer, et al. 1984! and jun Ryseck, et al. 1988! families. The Egr-1 protein is known to be localized to the nucleus Cao, et al. 1990; Day, et al., 1990; Waters, et al. 1990!, to bind to DNA at a site comprising the polynucleotide sequence CGCCCCCGC Christy, et al. 1989; Cao, et al. 1990; Lemaire, et al. 1990!, and to activate transcription through this specific sequence Lemaire, et al. 1990; Patwardhan, et al. 1991!. The evolutionary conservation of this gene Sukhatme, et al. 1988!, as well as the broad spectrum of induction--by TPA and growth factors Lim, et al. 1987; Milbrandt 1988; Lemaire, et al. 1988; Christy, et al. 1988; and Sukhatme, et al. 1988!, by neuronal stimuli (Sukhatme, et al. 1988; Milbrandt 1988; and Cole, et al. 1989!, by ischemic injury Oullette, et al. 1990; Gilman, et al. 1986!, and in some contexts in response to differentiation signals Sukhatme, et al. 1988!--implicates Egr-1 as an important nuclear intermediary in signal transduction.
DNA binding domains of transcription factors are well known in the art. Exemplary transcription factors known to contain a DNA binding domain are the GAL4, c-fos, c-Jun, lac1, trpR, CAP, TFIID, CTF, Sp1, HSTF and NF-KB proteins. Preferably, a DNA binding domain is derived from the GAL4 protein.
The GAL4 protein is a transcription factor of yeast comprising 881 amino acid residues. The yeast protein GAL4 activates transcription of genes required for catabolism of galactose and melibiose. GAL4 comprises numerous discrete domains including a DNA binding domain Marmorstein et al. 1992!. The DNA sequences recognized by GAL4 are 17 base pairs (bp) in length, and each site binds a dimer of the protein. Four such sites, similar but not identical in sequence, are found in the upstream activating sequence (UAS.sub.G) that mediates GAL4 activation of the GAL1 and GAL10 genes, for example Marmorstein et al., 1992!.
Of particular interest to the background of the invention is a recent report Chowdhury et al. 1987! relating to an asserted "family" of genes encoding proteins having histidine/cysteine finger structures. These genes, designated "mkr1" and "mkr2", appear to be the first such isolated from mammalian tissue and are not correlated to any early growth regulatory events.
There continues to exist in the art a need for information concerning the primary structural conformation of early growth regulatory proteins, especially DNA binding proteins, such as might be provided by knowledge of human and other mammalian polynucleotide sequences encoding the same. A body of work suggests the modular nature of transcription factors, in which functional domains are structurally independent and able to confer activity on heterologous proteins Ptashne 1988!. To date, the domains responsible for these functions have not been identified in Egr-1 and Egr-1 proteins. Activation domains, and more recently repression domains, have been demonstrated to function as independent, modular components of transcription factors. Activation domains are not typified by a single consensus sequence but instead fall into several discrete classes: for example, acidic domains in GAL4 Ma, et al. 1987!, GCN4 Hope, et al. 1986!, VP16 Sadowski, et al. 1988!, and GATA-1 Martin, et al. 1990!; glutamine-rich stretches in Sp1 Courey, et al. 1988! and Oct-2/OTF2 Muller-Immergluck, et al. 1990; Gerster, et al. 1990!; proline-rich sequences in CTF/NF-1 Mermod, et al. 1989!; and serine/threonine-rich regions in Pit-1/GHF-1 Theill, et al. 1989! all function to activate transcription. The activation domains of fos and jun are rich in both acidic and proline residues Abate, et al. 1991; Bohmann, et al. 1989!; for other activators, like the CCAAT/enhancer-binding protein C/EBP Friedman, et al. 1990!, no evident sequence motif has emerged.
To date the only well characterized repression domain is the alanine-rich sequence in the Drosophila gap protein Kruppel Licht, et al. 1990; Zuo, et al. 1991). Other Drosophila proteins such as Even-skipped Han, et al., 1989; Biggin, et al. 1992) and Engrailed (Han, et al. 1989; Jaynes, et al. 1991!, and mammalian DNA-binding proteins such as Tst-1/SCIP Moniku, et al. 1990!, WT1 Madden, et al. 1991!, and YY1/NF-E1.delta. Shi, et al. 1991; Harihan, et al. 1991; Park, et al. 1991! have been shown to act as repressors. Of these, Kruppel, Engrailed, WT1, and YY1 /NF-E1/.delta. have been shown to confer their repression function on a heterologous DNA-binding domain. However, except in the case of Kruppel, the sequences responsible have not been precisely delineated.
Nuclear localization signals (NLS) are generally short stretches of 8-10 amino acids characterized by basic residues as well as proline. NLS sequences are retained in the mature protein, may be found at any position as long as it is exposed on the protein surface, and can be present in multiple copies. Proteins enter the nucleus through nuclear pores by a two-step process: the first step is a rapid, signal-dependent binding to the nuclear pore periphery, while the second step is a slower, ATP-and temperature-dependent translocation across the pore Garcia-Bustos, et al. 1991; Silver 1991!.
Precedents for the incorporation of nuclear targeting signals within a DNA-binding domain include fos Tratner, et al. 1991!; the progesterone receptor, in which the second finger but not the first functions as an NLS Guiochon-Mantel, et al. 1991!; GAL4 Silver, et al. 1984!; and the homeodomain proteins .alpha.2 and Pit-1/GHF-1 Hall, et al. 1990; Theill, et al. 1989!. If nuclear localization signals and Cys.sub.2 His.sub.2 finger domains--both typified by basic residues--have co-evolved, NLS sequences may generally be found adjacent to or integrated within zinc finger domains.
Other bipartite nuclear localization signals have been characterized in the polymerase basic protein 1 of influenza virus (PB1) Nath, et al. 1990!; Xenopus protein N1 Kleinschmidt, et al. 1988!; adenovirus DNA-binding protein (DBP) Morin, et al. 1989!; and the yeast repressor .alpha.2 which has two nonhomologous signals, a basic NLS found at the N-terminus, as well as a signal located in the homeodomain Hall, et al. 1984, 1990!. Because each .alpha.2 signal gives a different phenotype individually, Hall et al. suggest that these nonhomologous signals mediate separate steps in nuclear accumulation.
Availability of polynucleotide sequences associated with specific regulatory functions of proteins, such as those discussed above, would make possible the application of recombinant methods to the large scale production of the proteins in procaryotic and eukaryotic host cells, as well as DNA-DNA and DNA-RNA hybridization procedures for the detection, quantification and/or isolation of nucleic acids associated with these and related proteins. Possession of such DNA-binding proteins, and/or knowledge of the amino acid sequences of the same, would allow, in turn, the development of monoclonal and polyclonal antibodies thereto (including antibodies to protein fragments or synthetic peptides modeled thereon) for use in immunological methods for the detection and quantification of early growth regulatory proteins in fluid and tissue samples as well as for tissue specific delivery of substances such as labels and therapeutic agents to cells expressing the proteins. DNA probes based on the polynucleotide sequences for these mammalian early growth regulatory proteins may be of use in detecting gene markers used for the diagnosis of those clinical disorders which are linked to the marker genes.