In eukaryotic genes there is a growing understanding of the DNA sequence elements which direct the initiation of transcription and which regulate or modulate gene expression. The following discussion applies to genes transcribed by RNA polymerase II. Promoters are the portions of DNA sequence at the beginnings of genes which contain the signals for RNA polymerase to begin transcription so that protein synthesis can then proceed. Eukaryotic promoters are complex, and are comprised of components which include a TATA box consensus sequence in the vicinity of about -30, and often a CAAT box consensus sequence at about -75 bp 5' relative to the transcription start site, or cap site, which is defined as +1 (R. Breathnach and P. Chambon (1981) Ann. Rev. Biochem. 50:349-383; J. Messing et al. (1983) in Genetic Engineering of Plants, T. Kosuge et al. (eds.), pp. 211-227). In plants there may be substituted for the CAAT box a consensus sequence which Messing et al. (1983) have termed the AGGA box, positioned a similar distance from the cap site. Additional DNA sequences in the 5' untranscribed region are believed to be involved in the modulation of gene expression. There are DNA sequences which affect gene expression in response to environmental stimuli, such as illumination or nutrient availability or adverse conditions including heat shock, anaerobiosis, or the presence of heavy metals. There are also DNA sequences which control gene expression during development, or in a tissue-specific fashion. Other DNA sequences have been found to elevate the overall level of expression of the nearby genes; such sequences have been termed "enhancers" in animal systems. In yeast, similar stimulatory sequences are known which are called "upstream activating sequences," which also often appear to carry regulatory information. Promoters are usually positioned 5', or upstream, relative to the start of the coding region of the corresponding gene, and the tract containing all the ancillary elements affecting regulation or absolute levels of transcription may be comprised of less than 100 bp or as much as 1 kbp.
As defined by Khoury and Gruss (1983) Cell 33:313-314, an enhancer is one of a set of eukaryotic promoter elements that appears to increase transcriptional efficiency in a manner relatively independent of position and orientation with respect to the nearby gene. The prototype enhancer is found within the 72 bp repeat of SV40. It is located more than 100 bp upstream from the transcription start site, and has a consensus core sequence of GTGGAAA(or TTT)G. As a rule the animal or animal virus enhancers can act over a distance as much as 1 kbp 5', in either orientation, and can act either 5' or 3' to the gene. The sequence motif is generally reiterated several times. Enhancers have been used in animal virus systems to study genes with weak promoters (F. Lee et al. (1981) Nature 294:228-232; A. Huang et al. (1981) Cell 27:245-255). There have been sequences from plant genes described which have homology to the animal enhancer consensus core sequence. A functional role for these sequences has not been established. One example in which such homology has been found is that of the pea legumin gene 5' region in which the sequence 5'-CCACCTCC-3' appears at about -180 relative to the transcription start site. This sequence shows about 80% homology to the complement of the animal sequence (G. Lycett et al. (1984) Nucleic Acids Res. 12:4493-4506). Two other examples where a similar sequence appears are in the 5'-flanking regions of the maize adh1 and adh2 genes. In those cases the sequence of note is CACCTCC, and appears at about -170 for adh2 and -200 for adh1 (E. Dennis et al. (1985) Nucleic Acids Res. 13:727-743; and D. Llewellyn et al. (1985) in Molecular Form and Function of the Plant Genome, van Vloten-Doting et al. (eds.), Plenum Press, New York).
The yeast upstream activating sequences (UAS) have somewhat different properties than those of the animal enhancers. Like the animal enhancers the yeast UAS's function when inserted in either orientation; they do not appear able to activate transcription when placed 3' to the transcription start site (L. Guarente and E. Hoar (1984) Proc. Natl. Acad. Sci. USA 81:7860-7864; and K. Struhl (1984) Proc. Natl. Acad. Sci. USA 81:7865-7869). Sequences of the activating regions of some yeast promoter elements are known, and in at least two cases, homology to the SV40 enhancer consensus core sequence has been shown (B. Errede et al. (1985) Proc. Natl. Acad. Sci. USA 82:5423-5427, and G. Roeder et al. (1985) Proc. Natl. Acad. Sci. USA 82:5428-5432). Associated with these sequences is also information allowing the cell to respond to stimuli such as nutritional status or mating type, depending on the particular UAS.
Another case where upstream sequence motifs regulate downstream transcriptional activity is that of the heat shock element. It is involved with the control of genes expressed in response to the stress of elevated temperature in organisms from yeast to man and plants. In Drosophila the consensus sequence for the motif is 5'-CTGGAAT- TTCTAGA-3' (H. Pelham and M. Bienz (1982) in Heat Shock From Bacteria to Man, Cold Spring Harbor Laboratory, pages 43-48). D. Rochester et al. (1986) EMBO J. 5:451-458, have identified two sequences 5' to the maize hsp70 heat shock gene which are partially homologous to the consensus sequences: 5'- CCAGAGCCTTCCAGAA-3' and 5'-CCCGAATCTTCTGGA-3'.
Recently there has been a surge of interest in plant control elements; there have been sequences proposed to be involved in tissue specificity and in responses to light and anaerobic conditions, and there have been postulated enhancer-like sequences 5' to (some) highly expressed genes. One report of an enhancer-like sequence is that of J. Odell et. al. (1985) Nature 313:810-812, who have described the stretch of 5' nontranscribed region of the 35S gene of Cauliflower Mosaic Virus (CaMV) which is necessary for promoting the expression of a reporter gene. Examination of the sequence in the -105 to -46 region revealed a CAAT box-like sequence, an inverted repeat region, and a sequence resembling the animal core sequence for enhancers. It has been demonstrated (Ellis et al. (1987) EMBO J. 6:11-16) that a 309 bp fragment (-395 to -86) from the region 5' of the TATA box of the 35S promoter is responsible for enhancing transcriptional activity. It is known that although the host range of the CaMVis limited to members of the family Cruciferae, the entire 35S promoter does function in tobacco (J. Odell et al. (1985) supra, M. Bevan et al. (1985) EMBO J. 4:1921-1926).
Literature concerning cross expression studies, wherein a gene from one plant species is examined for expression in a different species, is growing. An early report of cross expression is that of N. Murai et al. (1983) Science 222:476-482. They reported the expression of phaseolin protein from Phaseolus vulgaris L. in sunflower (Helianthus) tissue as both a fusion protein behind a T-DNA promoter and under the control of its own promoter. Sengupta-Gopalan et al. subsequently reported that the phaseolin promoter and structural gene were functional in tobacco, and that the tissue-specific expression in the heterologous host was similar to that in the native bean host (C. Sengupta-Gopalan et al. (1985) Proc. Natl. Acad. Sci. USA 82:3320-3324).
W. Gurley et al. (1986) Mol. Cell Biol. 6:559-565 described the expression of a soybean heat shock gene in sunflower tumor tissue; the gene was strongly transcribed, and with the correct thermal induction response. Because the gene carried 3.2 kb of upstream DNA, it was presumably transcribed in response to signals carried by its own promoter.
Another example is that of J. Jones et al. (1985) EMBO J. 4:2411-2418. The promoter from a petunia chlorophyll a/b binding protein gene was fused to the octopine synthase gene (ocs), which provided unique sequence for detection in Northern and solution hybridization experiments. These workers found that transcription occurred in both regenerated transformed homologous (petunia) and heterologous (tobacco) plants. Ocs reporter gene activity was not detected, perhaps because the construction yielded a (potential) translational fusion with three amino acid substitutions at the amino terminus of the Ocs polypeptide.
A report of expression across the monocot-dicot boundary is that of G. Lamppa et al. (1985) Nature 316:750-752. The wheat gene whAB1.6 encoding the major chlorophyll a/b binding protein was cloned into a T-DNA-containing vector, and transferred to both petunia and tobacco. Expression, at the level of transcription, was determined to be light-inducible and tissue-specific in the dicotyledonous hosts, as it was in the wheat. No data concerning the synthesis of the actual foreign protein were given.
D. Rochester et al. (1986) EMBO J. 5:451-458, have also detected the expression of a maize promoter in a dicotyledonous plant. The maize promoter used was that of a hybrid hsp70 gene. Hsp70 is one of a set of proteins induced in maize, as in organisms from bacteria to man, in response to heat shock. In the transgenic petunia the maize hsp70 mRNA was synthesized only in response to thermal stress.
An early study of actual plant regulatory sequences is that of M. Timko et al. (1985) Nature 318:579-582. A stretch of DNA from -973 to -90 5' to the transcriptional start site of the pea rbcS ss3.6 (ribulose 1,5 bis-phosphate carboxylase small subunit) was found to increase the level of induction of a reporter gene after illumination of transgenic tobacco plants. The stimulatory effect was observed when the -973 to -90 segment was inserted in both orientations; it did not promote high levels of gene expression when inserted 3' to the reporter gene. J. Simpson et al. (1985) EMBO J. 4:2723-2729, studied the effect of upstream sequences from the pea chlorophyll a/b binding protein AB80 gene using an enzymatic reporter. They found that 400 bp of upstream sequence carried the necessary information for both light-induction and tissue specificity, and that sequences further upstream were involved in determining the absolute level of gene expression. In a figure showing sequence data, there is a 6 bp motif highlighted as being somewhat homologous to the animal enhancer core consensus sequence, TGGATA, which occurs at about -230 relative to the start of transcription. In neither report is there definitive data associating a specific nucleotide sequence with functional activity.
In H. Kaulen et al. (1986) EMBO J. 5:1-8, the light induction of chalcone synthase was studied using fusions of the nontranscribed region 5' to the gene with a reporter gene. 1.2 kbp of 5' DNA gave light inducibility and maximal expression, and deletion of the -1200 to -357 gave lower expression, but the light induction response was not reported. These authors examined the sequence and found 47 bp repeats in the region between -661 and -564; that region includes a good match to the animal enhancer consensus core sequence 5'-GTGGTTAG-3'.
In recent studies published after the priority date hereof, by Schulze-Lefert et al. (1989) EMBO J. 8:652-656, three regions of sequence within the parsley chalcone synthase promoter were highlighted: (1) region I at around -140 is centered on the twice reiterated sequence 5'-AACCT-3'; (2) region II contains an octamer with perfect dyad symmetry (5'-CCACGTGG-3') at position--165; while (3) region III is a degenerate repeat of region II at position -230. It was shown that a chalcone synthase promoter fragment spanning from -100 to -226, containing footprint regions I and II, is light responsive in conjunction with the cognate promoter up to -100. However, sequences further 5', which include region III, clearly increase both induced and uninduced levels of glucuronidase (GUS) expression. Significantly, deletion of all three footprinted regions, leaving 100 bp of promoter, results in uninducible basal levels of GUS activity. Mutations within either region I or region II abolish inducibility, indicating that both regions I and II are necessary for a light response in the context of the minimal parsley chalcone synthase promoter.
There was a relatively thorough discussion of cis-active sequence involvement in light induction and tissue specificity in R. Fluhr et al. (1986) Science 232:1106-1112. They showed that the -1059 to -2 region 5' to the pea rbcS-E9 gene gave both light inducible and tissue specific expression, and that the -352 to -2 region conferred normal expression in transgenic petunias but significantly lower levels of expression in calli. The light response was elicited only when the -37 to -2 region of 5' DNA was present. The 5' -410 to +15 region from the related rbcS-3A gene gave tissue specificity and light induction. In an attempt to further dissect sequence functions, they fused the -327 to -48 fragment to an enhancerless CaMV 35S promoter-reporter gene system; that fragment gave light induction and tissue specificity when inserted in both orientations. The -317 to -82 fragment from the rbcS-E9 gave similar results. Again, sequence analysis revealed regions similar to SV40 enhancers. The authors claim that these upstream stretches of DNA have the properties of light-inducible transcription enhancers; specific DNA sequences within those regions were not identified. The authors went on to discuss the analysis of seven sequenced rbcS upstream regions in which sequences similar to the SV40 enhancer core consensus and to the yeast Ty enhancer were found. These sequenced genes included representatives from Nicotiana and soybean as well as the pea. G. Morelli et al. (1985) Nature 315:200-204, reported a control sequence for dicot light-regulated genes, which is 5'-CATTATATATAGC(orA)-3'. It is thought (Green et al. (1987) EMBO J. 6:2543-2549) that cis-acting elements function by binding to trans-acting protein factors present in plant cell nuclei. Such factors would then interact directly or via other proteins with RNA polymerase II to modulate transcription. In later studies Green et al. (1988) EMBO J. 7:4035-4044 defined a core of six residues (GGTTAA) within region II sequence (GTGTGGTTAATATG) that are critical for binding.
Two Agrobacterium tumefaciens T-DNA genes have been well characterized. The ocs gene encodes octopine synthase, and is carried on octopine-type Ti plasmids such as pTiAch5 and pti15559. The gene for nopaline synthase is nos, and it resides on the nopaline-type Ti plasmids. Both ocs and nos and their 5'-flanking regions have been sequenced (H. DeGreve et al. (1982) J. Mol. Appl. Genet. 1:499-511; M. Bevan et al. (1983) Nucleic Acids Res. 11:369-385; A. Depicker et al. (1982) J. Mol. Appl. Genet. 1:561-573). Expression of both of these genes in plant tissue is constitutive, and there does not appear to be tissue specificity (L. Otten et al. (1981) Mol. Gen. Genet. 183:209-213). However, it has recently been observed that the activity of the nos promoter is organ specific and developmentally regulated (An et al. (1988) Plant Physiol. 88:547-552).
There were no published data for enhancer-like activity in T-DNA 5' untranscribed regions before 1987. C. Koncz et al. (1983) EMBO J. 2:1597-1603, did show that the region between -294 and -170 was required for full expression of ocs. The sequence for ocs was published by H. DeGreve et al. (1983) supra, after animal and animal virus enhancers were known. The authors noted the presence of a TATA box-like sequence and a polyadenylation signal at the 3' side of the gene, but did not note any sequence of potential regulatory significance. They suggested that because the ocs promoter is close to the edge of the T-DNA, there might be flanking plant sequences that influence the levels of ocs transcription. Ellis et al. (1987) EMBO J. 6:3203-3208 and Leisner et al. (1988) Proc. Natl. Acad. Sci. 85:2553-2557 reported a 176 bp element isolated from the ocs gene which exhibited enhancer-like properties in transgenic tobacco plants. The ocs gene was transferred and integrated with the T-DNA of Agrobacterium into the genome of plant cells during initiation of the crown gall tumor. The gene was not expressed in Agrobacterium but was expressed in the plant (Otten et al. (1981) Mol. Gen. Genet. 183:209-213). Although the infectivity of Agrobacterium is limited generally to dicotyledonous plants, the transcriptional enhancer of the ocs promoter functions in both monocots and dicots and does not require any factors supplied by other genes of the bacterium (Hooykaas-van Slogteren et al. 1984.
There are several techniques available for introducing recombinant DNA into plant tissue for either stable integration into the plant genome or for measuring engineered gene activity in transient expression systems where incorporation into the genome is not required. Representative bacteria-to-plant T-DNA dependent cloning vector systems are described in G. An (1986) Plant Physiol. 81:86-91; G. An et al. (1985) EMBO J. 4:277-284; L. Herrera-Estrella et al. (1983) EMBO J. 2:987-995; L. Herrera-Estrella et al. (1983) Nature 303:209-213; and L. Herrera-Estrella et al. (1985) in Plant Genetic Engineering, J. H. Dodds (ed.), New York: Cambridge University Press, pp. 63-93. The T-DNA vectors rely on mobilization from bacteria to plant using functions supplied in trans by Agrobacterium tumefaciens and its resident Ti plasmid. T-DNA mediated transfer generally is effected in such a way that stable integration into the genome results. The most widely used plant host models for recombinant T-DNA work are the dicots sunflower, petunia, and tobacco. The technique of agroinfection has extended the range of monocots into which T-DNA-containing vectors can be introduced (N. Grimsley et al. (1986) Proc. Nat. Acad. Sci. USA 83:3282-3286).
Alternatives to the Agrobacterium-mediated DNA transfer systems are known, and include electroporation of both monocots and dicot plant protoplasts to incorporate DNA (M. Fromm et al. (1985) Proc. Natl. Acad. Sci. USA 82:5824-5828) and direct transformation of protoplasts with DNA molecules mediated by polyethylene glycol (J. Paszkowski et al. (1984) EMBO J. 3:2717-2722) or calcium ions. Another T-DNA independent means for introducing recombinant DNA is microinjection of DNA into plant cell nuclei (A. Crossway et al. (1986) Mol. Gen. Genet. 202:179-185). The techniques use plant cell protoplasts (wall-less forms) as the initial DNA recipients; known manipulations of protoplasts can result in cell or tissue culture, or ultimately in regenerated transformed plants. Use of such alternatives significantly expands the range of plants into which heterologous genes can be introduced. Paszkowski et al. (supra) have shown that integration into the genome is possible without the presence of T-DNA sequences.