In eukaryotic genes there is a growing understanding of the DNA sequence elements which direct the initiation of transcription and which regulate or modulate gene expression. The following discussion applies to genes which are transcribed by RNA polymerase II. There are sequence elements which direct the initiation of mRNA synthesis, those which control transcription in response to environmental stimuli, and those which determine the overall level of transcription.
Promoters are the portions of DNA sequence, at the beginnings of genes, which contain the signals for RNA polymerase to begin transcription of mRNA which in turn is used as a template for protein synthesis. Eukaryotic promoters are complex, and are comprised of components which include a TATA box consensus sequence in the vicinity of position -30, and often a CAAT box consensus sequence at about -75 bp 5' relative to the transcription start site, which is defined as +1 (R. Breathnach and P. Chambon (1981), Ann. Rev. Biochem. 50:349; J. Messing et al. (1983), in Genetic Engineering of Plants, eds. T. Kosuge, C. Meredith, and A. Hollaender, p. 211). In plants there may be substituted for the CAAT box a consensus sequence which Messing et al. (1983) have termed the AGGA box, positioned a similar distance from the cap site. Other promoter associated sequences in the 5'-untranscribed region are known which modulate or regulate the expression of downstream genes. There are sequences which respond to environmental stimuli, such as illumination or nutrient availability or adverse conditions including heat shock, anaerobiosis, or the presence of heavy metals. There are also signals which control gene expression during development, or in a tissue-specific fashion. Other sequences serve to elevate the overall level of expression of the downstream genes; such sequences have been termed "enhancers" in animal systems. In yeast, similar stimulatory sequences are known called "upstream activating sequences", which often also appear to carry regulatory information. Promoters are usually positioned 5' to, or upstream of, the start of the coding region of the corresponding gene, and the DNA tract containing all the ancillary elements affecting regulation or absolute levels of transcription may be comprised of less than 100 bp or as much as 1000 bp.
As defined by G. Khoury and P. Gruss (1983), Cell 33:313, an enhancer is one of a set of eukaryotic promoter-associated elements that appears to increase transcriptional efficiency in a manner relatively independent of position and orientation with respect to the nearby gene. The prototype enhancer is the 72 bp repeat of SV40. It is located more than 100 bp upstream from the transcription start site, and has a consensus core sequence of ##STR1## As a rule the animal or animal virus enhancers can function over a distance as much as 1 kb 5', in either orientation, and can act either 5' or 3' to the gene. The sequence motif is generally reiterated several times. In animal systems enhancers have been associated with tissue-specific regulation of expression.
Homology to the SV40 animal enhancer consensus core sequence has been noted in the nontranscribed regions of plant genes. In the 5'-flanking region of the pea legumin, the sequence 5'-CCACCTCC-3', which is about 80% homologous to the complement of the SV40 animal sequence, appears at about -180 relative to the start of transcription (G. Lycett et al. (1984) Nucleic Acids Res. 12:4493). Similar sequence motifs have been noted in the 5'-regulatory regions of the light-regulated genes: chalcone synthase (H. Kaulen et al. (1986) EMBO J. 5:1) and several rbcS genes including those of tobacco, soybean and pea (R. Fluhr et al. (1986) Science 232:1106).
SV40 enhancer-homologous sequences have also been identified in the 5'-flanking regions of the maize Adh1 and Adh2 genes. In both cases the sequence of note is 5'-CACCTCC-3', and appears at about -170 in Adh2 and at about -200 in Adhl (E. Dennis et al. (1985) Nucleic Acids Res. 13:727; D. Llewellyn et al. (1985) in Molecular Form and Function in the Plant Genome, eds. van Vloten-Doting, DeGroot, and T. Hall, New York, Plenum Press). A functional role for these SV40 homologous plant sequences as enhancers, however, has not been demonstrated.
Upstream sequence motifs, termed heat shock elements (HSEs), have been found to direct the induction of the heat shock genes in the response to the stress of elevated temperature in organisms as diverse as bacteria, yeast, man and plants. In Drosophila the minimal consensus sequence for the motif is 5'--C.sub.-- GAA.sub.-- TTC.sub.-- G.sub.-- -3' (H. Pelham (1985) Trends Genet., January, pp. 31-35). The HSEs of Drosophila also exhibit some properties of enhancer elements (M. Bienz and H. Pelham (1986) Cell 45:753). W. Gurley et al. (1986) Mol. Cell. Biol. 6:559, have found sequence elements with partial homology to the Drosophila HSE consensus sequence at the 5'-end of the soybean Gmhsp17.5-E gene. A study of heat shock expression of this gene in transformed sunflower tumor tissue revealed that sequence information between -95 and the cap site was sufficient to direct thermoinducible transcription, but that sequences further upstream (between -95 and -1175) dramatically increased both induced and basal levels of transcription suggestive of enhancer activity.
Enhancer-like activities have also been associated with plant regulatory sequences that are believed to be involved in the control of tissue specific expression and expression in response to light (M. Timko et al. (1985) Nature 318: 579; H. Kaulen et al. (1986) EMBO J. 5:1; J. Simpson et al. (1985) EMBO J. 4:2723; J. Simpson et al. (1986) Nature 323:551; R. Fluhr et al. (1986) Science 232:1106). Although, in some cases sequences homologous to the SV40 enhancer or the Ty yeast enhancer and repeated sequence elements were noted in the upstream regions displaying enhancer activity, these motifs have not been correlated with the enhancer activity.
The presence of enhancer-like sequences 5' to certain genes which are highly expressed in plants has been postulated. One such report (J. Odell et al. (1985), Nature 313:810) described the stretch of 5'-nontranscribed region of the 35S gene of Cauliflower Mosaic Virus (CaMV) which is necessary for increasing the expression of a reporter gene. Analysis of the sequence in the -105 to -46 region revealed a CAAT box-like sequence, inverted repeats, and a sequence resembling the SV40 core consensus sequence for enhancers. Ow et al. (1987) Proc. Natl. Acad. Sci. USA 84:4870-4873 report that the CaMV upstream region between -168 and -89 functions in transcriptional activation of the 35S RNA gene as well as of certain heterologous plant-expressible genes. The -148/-89 upstream fragment is reported to function in either orientation 5' of reporter genes, but not when positioned 3' to the gene. Multiple duplication of portions of the CaMV upstream region (-148/-89 fragment or the -343/-90 fragment) yielded significantly higher levels of expression than that induced by a single copy of the region (D. Ow et al. (1987); R. Kay et al. (1987) Science 236;1299). It is known that although the host range of the CaMV is limited to members of the family Cruciferae, the entire 35S promoter does function in tobacco (J. Odell et al. (1985) supra; M. Bevan et al. (1985) EMBO J. 4:1921).
The upstream activating sequences (UASs) of yeast have somewhat different properties than those of animal enhancer sequence elements. Like the animal enhancers the yeast UASs generally function when inserted in either orientation, but they do not appear able to activate transcription when placed 3' to the transcription start site (L. Guarente and E. Hoar (1984) Proc. Natl. Acad. Sci. USA 81:7860; K. Struhl (1984) Proc. Natl. Acad. Sci USA 81:7865). Sequences of the activating regions of some yeast promoter elements are known, and in at least two cases, homology to the SV40 enhancer consensus core sequence was reported (B. Errede et al. (1985)Proc. Natl. Acad. Sci. USA 82:5423; G. Roeder et al. (1985) Proc. Natl. Acad. Sci. USA 82:5428). Also associated with these sequences is information allowing the cell to respond to mating type or to stimuli such as nutritional status, depending on the particular UAS.
The tumor-inducing (Ti) plasmids carried by strains of Agrobacterium contain T-DNA regions that are transferred to and integrated into plant genomes. A number of genes encoded on T-DNA are expressed in plants, including for example, those genes responsible for production of opines in T-DNA containing plant tumors. The ocs gene encoding octopine synthase is carried within the T-DNA of octopine-type Ti plasmids such as pTiAch5 and pTi15955. The gene for nopaline synthase (nos) resides within the T-DNA of nopaline-type Ti plasmids, such as pTiC58 and pTiT37. Expression of ocs and nos genes in transformed plant tissue is constitutive and is apparently not tissue specific (L. Otten et al. (1981) Mol. Gen. Genet. 183:209). It has been proposed by W. Bruce and W. Gurley (1987) Mol. Cell. Biol. 7:59 that T-DNA sequences which regulate gene expression in transformed plants would possess maximum conservation of function in plants because the host range of Agrobacterium is very broad (M. DeCleene and J. DeLey (1976) Bot. Rev. 42:89; G. Hooykaas-van Slogteren, et al. (1984) Nature 311:763). The regulatory regions of the plant-expressible genes of T-DNA are of interest as model systems for studying the mechanism of constitutive gene expression in plants.
The upstream regions of both the nos and ocs genes have been subjected to detailed analysis. Both ocs and nos and the 5'-flanking regions of these genes have been sequenced (H. DeGreve et al. (1982) J. Mol. Appl. Genet. 1:499; M. Bevan et al. (1983) Nucleic Acids Res. 11:369, A. Depicker et al. (1982) J. Mol. Appl. Genet. 1:561).
There are conflicting data in the literature regarding the extent of 5'-sequence required for maximal expression of the nos gene. C. Koncz et al. (1983) EMBO J. 2:1597-1603 reported that all signals required for maximal expression of the nos gene were within the 261 bp of sequence preceding the transcriptional start site. In contrast, C. Shaw et al (1984) Nucleic Acids Res. 12:7831, reported that sequences farther upstream than -88 were not essential for expression in a Kalanchoe leaf-and-stem test system. More recently, G. An et al. (1986) Mol. Gen. Genet. 203:245 reported that regions of upstream DNA including the TATA box (-26 to -19), perhaps the CCAAT box (-78 to -70), and a sequence between -130 and -101 are required for efficient transcription of nos. The presence of direct sequence repeats (-171 to -161 and -137 to -127) and indirect repeats (-148 to -141 and -114 to -106) in the nos upstream region were noted and deletion analysis suggested that these repeats were involved in the regulation of the level of downstream gene expression.
When the sequence of the ocs gene was published (H. DeGreve et al. (1982) supra), a TATA box-like sequence at the 5'-side of the gene and a polyadenylation signal at the 3'-side of the gene were noted, but no other sequence of potential regulatory significance was pointed out. It was suggested that perhaps because the ocs promoter is positioned close to the edge of the T-DNA, flanking plant sequences could influence the levels of ocs transcription.
C. Koncz et al. (1983), supra showed that sequence information in the region between -295 and -170 was essential for full expression of ocs, but specific sequences responsible for maximal gene expression were not identified. The upstream region of ocs has recently been reexamined and it was found that there is a regulatory sequence element contained within the region between -292 and -116 that acts to enhance or activate ocs gene expression (J. Ellis et (1987) EMBO J. 6:11; U.S. patent application Ser. No. 011,614). The element, termed a plant upstream activating sequence, is a 16 base pair palindromic sequence (5'-ACGTAAGCGCTTACGT-3') which activates the expression of a downstream gene driven by a plant-expressible promoter. A synthetic oligonucleotide comprising the aforementioned sequence or the appropriate fragment of the ocs upstream region was placed 5' to the maize anaerobically-regulated alcohol dehydrogenase (Adhl) promoter with a bacterial chloramphenicol acetyl transferase (CAT) reporter gene; in both instances anaerobic induction of CAT enzyme activity was obtained in stably transformed tobacco plants. Analogous constructions without the transcriptional activating element did not give detectable expression in tobacco when either CAT or Adhl served as the reporter gene. The functionality of the ocs gene transcription activating element was also determined using transient expression assays in cultured maize cells. Thus, the ability of the ocs transcription activating element to function in both monocotyledonous and dicotyledonous plants was established (J. Ellis et al. (1987) EMBO J. 6:3203-3208; U.S. patent application Ser. No. 011,614).
The presence of a transcription activating element in the upstream region of another T-DNA gene, the mannopine synthase gene (mas), has been suggested by deletion analysis (V. DiRita and S. Gelvin (1987) Mol. Gen. Genet. 207:233). No specific sequence motifs were linked to transcriptional activation.
The entire T-DNA region of an octopine type Ti plasmid, pTi15955 has been sequenced and the sequence has been analyzed for the location of open reading frames (ORFs), putative eukaryotic promoters, ribosome binding sites, and regions with potential secondary structure which might possess regulatory significance (R. Barker et al. (1983) Plant Mol. Biol. 2:335). Among the octopine T-DNA ORFs identified by sequence analysis is the 780 gene which corresponds to ORF 18 in T-right of Barker et al. This ORF was found to be transcribed in plants and is named for the size of its approximately 780 base transcript. The 780 gene product, which is nonessential for virulence, has not been identified, and its function is unknown (J. Winter et al. (1984) Nucleic Acids Res. 12:239; S. Karcher et al. (1984) Mol. Gen. Genet. 194:159). The upstream region of the 780 gene was noted by Barker et al. to have TATA- and CAAT-homologous regions but no other sequences of any potential functional significance were noted.
The present invention is based on a detailed analysis of the upstream regulatory region of the 780 gene which has in part been described by W. Bruce and W. Gurley (1987) Mol. Cell Biol. 7:59.