Gene expression in eukaryotic cells is regulated by DNA sequences found primarily (although not exclusively) in the region upstream (5' ward) of the transcription start site. These DNA sequences fall into two broad categories promoters and enhancers.(Maniatis et al. 1987) Promoter sequences generally, but not exclusively fall within about 100 base pairs (bp) of the transcription start site. The promoter determines the direction of transcription and the position at which transcription begins. Promoter sequences may or may not be active, in and of themselves, in directing expression (transcription) of genes or DNA sequences found downstream. Enhancer elements on the other hand, are DNA sequences generally found more than 100 bp from the transcription start site. Enchancers increase the rate of initiation of transcription from promoter sequences. Enhancers stimulate transcription by serving as binding sites for nuclear proteins. These proteins increase the rate of initiation by interacting with the proteins of the transcription initiation complex assembled on the promoter. These enhancer binding proteins stimulate transcription by an as yet unknown mechanism. Some of the properties of enhancers are known, however. These include:
1. Enhancers activate transcription from a distance. The distance may vary, depending on the particular enhancer, from less than 250 bp from the transcription start site to many thousands of bp (kbp) away (Serfling, et al., 1985; Banerji, et al., 1981; Grosschedl and Birnsteil, 1980).
2. Enhancer activity is independent of distance from the transcription start site. Individual enhancers may be moved, with respect to the transcription start site, without materially affecting their activity. Indeed, enhancers may be moved from the 5'-flanking DNA (Serfling, et al., 1981) (upstream of the transcription start site) to the 3'-flanking region (downstream of transcribed sequences) (Banerji, et al., 1981) or even within the transcription units (Gillieg, et al., 1983; Banerji, et al., 1983) and still function in stimulating transcription.
3. Enhancer activity is non-directional. Enhancer elements may be invered without affecting their activity (Grosschedl & Birnstiel, 1980; Banerji, et al., 1981). These properties distinguish enhancers from promoter elements (which may also stimulate transcription) in that promoter elements usually function only at a small distance from the transcription start site and only in a set orientation (Maniatis, et al., 1987). Such elements lose activity when moved or inverted. Transcription, having begun within the proximal promoter sequences, proceeds through the structural gene (the DNA sequences which encode a protein, also referred to as the transcription unit) and terminates within a DNA sequence (terminator) that specifies cessation of transcription and proper processing of the 3' terminus of the mRNA.
These various elements of a eukaryotic gene--enhancer, promoter, structural gene, and terminator--are modular and may be interchanged with DNA sequences of similar functions from other genes (Kaufman, 1990). If the structural gene encodes a protein whose production is desired either for sale or for research purposes, the assemblage is referred to as an expression cassette. The expression cassette's function is to direct transcription of whatever structural gene it contains, when introduced into a cell (whether grown in culture or contained within an animal). The particular cell must, of course, produce the nuclear proteins necessary to the functioning of the various parts of the expression cassette such as the enhancer-binding proteins.
Enhancers and their cognate nuclear proteins have been described from many sources. For example: the yeast (S. cerivisiae) GAL4 protein binds DNA sequences upstream of the GAL1 and GAL10 promoters and stimulates transcription (Johnston, 1987). The adenovirus E1A (Hearing & Schenk, 1986) and the Herpes simplex virus VP16 (Campbell, et al., 1984) proteins serve similar functions by binding DNA sequences within their respective viral genomes. VP16 acts as a transactivating protein in conjunction with a cellular transactivator called Oct 1, while the E1A protein acts alone. Similarly, proteins such as the glucocorticoid receptor, found in animal cells, stimulate transcription by binding their cognate enhancers within the genomes of animal cells.
The major milk protein genes of mammals are among the most highly expressed genes in any eukaryotic cell. For example, .alpha.-casein and .beta.-casein mRNA's together account for nearly half of all the mRNA in lactating mammary epithelial cells (Mercier, et al., 1985).
The enhancer and promoter elements responsible for this high level of expression will be useful in designing eukaryotic cell expression vectors based on strong cellular enhancers and promoters. Current expression vectors, utilizing enhancers and promoters from viral or cellular genes (Kaufman, 1990; Kriegler, 1990), produce less mRNA than major milk protein promoters (Mercier, et al., 1985).
The search for the expression control elements (enhancers and promoters) of major milk protein genes began about 10 years ago with the isolation of cDNAs (Richards, et al., 1981; Willis, et al., 1982) and genes (Yu-Lee, et al., 1983; Campbell, et al., 1984; Jones, et al., 1985) encoding rat .beta.- and .alpha.-casein and whey acidic protein (WAP). The genes for many major milk proteins of various species have now been cloned (reviewed in Mercier, et al., 1991) including the bovine .beta.-casein gene (Gorodetsky, et al., 1990; Schmidhauser, et al., 1990). Comparison of the primary nucleotide sequences found immediately upstream of the mRNA start site showed extensive sequence homology among the related alpha and beta casein genes, between -150 (150 bp upstream of the mRNA start site) and +40 (Yu-Lee, et al., 1986). In particular, the region between -150 and -100 (termed the "milk box"; Rosen, 1987) was also identified in the .alpha.-lactalbumin and WAP promoters.
Functional analysis of the expression control elements of the major milk protein genes began with the introduction of intact genes from other species into the genomes of transgenic mice. These included the rat .beta.-casein gene (Lee, et al., 1988), the rat and mouse whey acidic protein (AWP) genes (Bayna & Rosen, 1990; Burdon, et al., 1991), the sheep .beta.-lactoglobulin gene (Simons, et al., 1987), and the bovine and guinea pig .alpha.-lactalbumin genes (Maschio, et al., 1991; Vilotte, et al., 1989). The transgenes included the structural genes (introns and exons) for each protein and varying amounts of 3'- and 5'-flanking DNA. Each xenogeneic transgene was expressed predominantly in the mammary gland of female animals during lactation. These results indicated that each gene contained the expression control elements necessary for tissue-specific, developmentally-appropriate regulation of expression.
Various groups have reported expression of heterologous proteins in the mammary glands of transgenic animals using expression control elements from major milk protein genes. These include the oncogenes Ha-ras (Andres, et al., 1987) and c-myc (Schnenberger et al., 1988), human tissue plasminogen activator (Pittius, et al., 1988; Gordon, et al., 1987; Ebert, et al., 1991) the E.coli CAT gene (Lee, et al., 1989 a,b), human interleukin-2 (Buhler, et al., 1990), clotting factor IX (Clark et al., 1989; Simons et al., 1988), and human alpha-1-antitrypsin (Wright, et al., 1991; Meade et al., 1990).
Collectively, these studies showed that the control elements necessary for appropriate temporal and tissue specific regulation generally reside in the 5' flanking regions of most major milk protein genes while some elements responsible for high level expression may reside elsewhere in some genes. Because of the relative difficulty and expense of generating transgenic mice, it has not been possible to map individual expression control elements in this way.
Several major milk protein promoters have been studied in less complex systems that allow more complete analysis. Transfection of MMP promoter-reporter gene constructs into functional mammary epithelial cell lines such as HC11 (Ball, et al., 1988) and C1D9 (Schmidhauser et al., 1990) allows much more rapid and relatively less expensive and time consuming analysis of transcriptional activity. Such studies allow analysis of hormonal and substratum effects on gene expression. Elements of the mouse WAP gene necessary for prolactin and glucocorticoid induction have been localized (by functional analysis of promoter deletions in HC11 cells) to between -1500 and -450 while cell-type specific elements were localized to between -2500 and -1500 and between -450 and the transcript initiation site (Doppler et al., 1991).
Similar deletion analysis of the rat .beta.-casein promoter (Doppler et al., 1989) localized control elements important for hormonal regulation (prolactin and glucocorticoid) to between -2300 and -330, between -285 and -265 and between -190 and -170. These elements were not tested by inversion or by moving them with respect to the transcription start site so that no conclusions could be drawn regarding the presence or absence of transcriptional enhancers in these regions. Yoshimura and Oka (1990) showed that deletion of the region between -5300 and -545 of the mouse .beta.-casein gene had little effect on expression in transfected rabbit mammary epithelial cells.
The promoter regions of several major milk protein genes have been tested for nuclear protein binding by biochemical techniques such as electrophoretic mobility shift and DNaseI protection. The bovine .beta.-casein promoter contains a site (between -264 and -239) that binds the purified transcription factor NF1 (Kabishev, et al., 1990; Ivanov et al., 1990). The functional significance of this potential NF1 binding site is not known. Nuclear extracts from mammary tissue contain several proteins that interact with the region between -354 and -88 of the rat WAP promoter (Lubon and Hennighausen, 1987). Similarly, the region from -125 to -85 of the rat .alpha.-lactalbumin promoter is protected from nuclease digestion by proteins in mammary nuclear extracts (Lubon and Hennighausen, 1988). Functional tests of the importance of these regions in expression of the WAP and .alpha.-lactalbumin genes have not yet been carried out.
Detection of nuclear protein binding to major milk protein promoters helps to locate DNA sequences important to regulation of gene expression. However, functional analysis of the role of putative binding sites in gene expression must be carried out before definite conclusions can be drawn. Such analyses have been carried out in two cases. Lee and Oka (1992) detected mammary-specific protein binding to two regions of the mouse .beta.-casein promoter. Methylation interference experiments indicated that the guanosine residues at -8 and -350 were involved in the binding. Analysis of transcriptional activity of wild type promoters and of promoters with mutations in the binding regions indicated that these sites were involved in progesterone-mediated repression of transcription.
Schmitt-Ney, et al., (1991) identified binding sites for five nuclear proteins within the rat .beta.-casein promoter. Four were found in HC11 cell nuclear extracts. Two of the HC11 activities increased (activities C & D) and two decreased (activities A & B) following hormonal induction of casein expression. The A & B activities are thought to mediate repression because mutations affecting A binding caused an increase in basal (uninduced) promoter activity. The fifth binding activity, termed MGF, was found only in pregnant and lactating mammary gland (but not in HC11 cells). MGF binds to two sites, one between -80 and -100 and the other between -130 and -150. These sequences are conserved in other casein genes and in casein genes of other species. Mutations in the MGF binding sites that decrease protein binding also decrease transcriptional activity (Schmitt-Ney, et al., 1991, 1992).