The present invention relates to genetic engineering and more particularly to plant transformation in which a plant is transformed to express a heterologous gene.
Although great progress has been made in recent years with respect to transgenic plants which express foreign proteins such as herbicide resistant enzymes and viral coat proteins, very little is known about the major factors affecting expression of foreign genes in plants. Several potential factors could be responsible in varying degrees for the level of protein expression from a particular coding sequence. The level of a particular mRNA in the cell is certainly a critical factor.
The potential causes of low steady state levels of mRNA due to the nature of the coding sequence are many. First, full length RNA synthesis might not occur at a high frequency. This could, for example, be caused by the premature termination of RNA during transcription or due to unexpected mRNA processing during transcription. Second, full length RNA could be produced but then processed (splicing, polyA addition) in the nucleus in a fashion that creates a nonfunctional mRNA. If the RNA is properly synthesized, terminated and polyadenylated, it then can move to the cytoplasm for translation. In the cytoplasm, mRNAs have distinct half lives that are determined by their sequences and by the cell type in which they are expressed. Some RNAs are very short-lived and some are much more long-lived. In addtion, there is an effect, whose magnitude is uncertain, of translational efficiency on mRNA half-life. In addition, every RNA molecule folds into a particular structure, or perhaps family of sturctures, which is determined by its sequence. The particular structure of any RNA might lead to greater or lesser stability in the cytoplasm. Structure per se is probably also a determinant of mRNA processing in the nucleus. Unfortunately, it is impossible to predict, and nearly impossible to determine, the structure of any RNA (except for tRNA) in vitro or in vivo. However, it is likely that dramatically changing the sequence of an RNA will have a large effect on its folded structure. It is likely that structure per se or particular structural features also have a role in determining RNA stability.
Some particular sequences and signals have been identified in RNAs that have the potential for having a specific effect on RNA stability. This section summarizes what is known about these sequences and signals. These identified sequences often are A+T rich, and thus are more likely to occur in an A+T rich coding sequence such as a B.t. gene. The sequence motif ATTTA (or AUUUA as it appears in RNA) has been implicated as a destabilizing sequence in mammalian cell mRNA (Shaw and Kamen, 1986). No analysis of the function of this sequence in plants has been done. Many short lived mRNAs have A+T rich 3' untranslated regions, and these regions often have the ATTTA sequence, sometimes present in mutiple copies or as multimers (e.g., ATTTATTTA . . . ). Shaw and Kamen showed that the transfer of the 3' end of an unstable mRNA to a stable RNA (globin or VA1) decreased the stable RNA's half life dramatically. They further showed that a pentamer of ATTTA had a profound destabilizing effect on a stable message, and that this signal could exert its effect whether it was located at the 3' end or within the coding sequence. However, the number of ATTTA sequences and/or the sequence context in which they occur also appear to be important in determining whether they function as destabilizing sequences. Shaw and Kamen showed that a trimer of ATTTA had much less effect than a pentamer on mRNA stability and a dimer or a monomer had no effect on stability (Shaw and Kamen, 1987). Note that multimers of ATTTA such as a pentamer automatically create an A+T rich region. This was shown to be a cytoplasmic effect, not nuclear. In other unstable mRNAs, the ATTTA sequence may be present in only a single copy, but it is often contained in an A+T rich region. From the animal cell data collected to date, it appears that ATTTA at least in some contexts is important in stability, but it is not yet possible to predict which occurences of ATTTA are destabiling elements or whether any of these effects are likely to be seen in plants.
Some studies on mRNA degradation in animal cells also indicate that RNA degradation may begin in some cases with nucleolytic attack in A+T rich regions. It is not clear if these cleavages occur at ATTTA sequences. There are also examples of mRNAs that have differential stability depending on the cell type in which they are expressed or on the stage within the cell cycle at which they are expressed. For example, histone mRNAs are stable during DNA synthesis but unstable if DNA synthesis is disrupted. The 3' end of some histone mRNAs seems to be responsible for this effect (Pandey and Marzluff, 1987). It does not appear to be mediated by ATTTA, nor is it clear what controls the differential stability of this mRNA. Another example is the differential stability of IgG mRNA in B lymphocytes during B cell maturation (Genovese and Milcarek, 1988). A final example is the instability of a mutant beta-thallesemic globin mRNA. In bone marrow cells, where this gene is normally expressed, the mutant mRNA is unstable, while the wild-type mRNA is stable. When the mutant gene is expressed in HeLa or L cells in vitro, the mutant mRNA shows no instability (Lim et al., 1988). These examples all provide evidence that mRNA stability can be mediated by cell type or cell cycle specific factors. Furthermore this type of instability is not yet associated with specific sequences. Given these uncertainties, it is not possible to predict which RNAs are likely to be unstable in a given cell. In addition, even the ATTTA motif may act differentially depending on the nature of the cell in which the RNA is present. Shaw and Kamen (1987) have reported that activation of protein kinase C can block degradation mediated by ATTTA.
The addition of a polyadenylate string to the 3' end is common to most eucaryotic mRNAs, both plant and animal. The currently accepted view of polyA addition is that the nascent transcript extends beyond the mature 3' terminus. Contained within this transcript are signals for polyadenylation and proper 3' end formation. This processing at the 3' end involves cleavage of the mRNA and addition of polyA to the mature 3' end. By searching for consensus sequences near the polyA tract in both plant and animal mRNAs, it has been possible to identify consensus sequences that apparently are involved in polyA addition and 3' end cleavage. The same consensus sequences seem to be important to both of these processes. These signals are typically a variation on the sequence AATAAA. In animal cells, some variants of this sequence that are functional have been identified; in plant cells there seems to be an extended range of functional sequences (Wickens and Stephenson, 1984; Dean et al., 1986). Because all of these consensus sequences are variations on AATAAA, they all are A+T rich sequences. This sequence is typically found 15 to 20 bp before the polyA tract in a mature mRNA. Experiments in animal cells indicate that this sequence is involved in both polyA addition and 3' maturation. Site directed mutations in this sequence can disrupt these functions (Conway and Wickens, 1988; Wickens et al., 1987). However, it has also been observed that sequences up to 50 to 100 bp 3' to the putative polyA signal are also required; i.e., a gene that has a normal AATAAA but has been replaced or disrupted downstream does not get properly polyadenylated (Gil and Proudfoot, 1984; Sadofsky and Alwine, 1984; McDevitt et al., 1984). That is, the polyA signal itself is not sufficient for complete and proper processing. It is not yet known what specific downstream sequences are required in addition to the polyA signal, or if there is a specific sequence that has this function. Therefore, sequence analysis can only identify potential polyA signals.
In naturally occuring mRNAs that are normally polyadenylated, it has been observed that disruption of this process, either by altering the polyA signal or other sequences in the mRNA, profound effects can be obtained in the level of functional mRNA. This has been observed in several naturally occuring mRNAs, with results that are gene specific so far. There are no general rules that can be derived yet from the study of mutants of these natural genes, and no rules that can be applied to heterologous genes. Below are four examples:
1. In a globin gene, absence of a proper polyA site leads to improper termination of transcription. It is likely, but not proven, that the improperly terminated RNA is nonfunctional and unstable (Proudfoot et al., 1987).
2. In a globin gene, absence of a functional polyA signal can lead to a 100-fold decrease in the level of mRNA accumulation (Proudfoot et al., 1987).
3. A globin gene polyA site was placed into the 3' ends of two different histone genes. The histone genes contain a secondary structure (stem-loop) near their 3' ends. The amount of properly polyadenylated histone mRNA produced from these chimeras decreased as the distance between the stem-loop and the polyA site increased. Also, the two histone genes produced greatly different levels of properly polyadenylated mRNA. This suggests an interaction between the polyA site and other sequences on the mRNA that can modulate mRNA accumulation (Pandy and Marzluff, 1987).
4. The soybean leghemoglobin gene has been cloned into HeLa cells, and it has been determined that this plant gene contains a "cryptic" polyadenylation signal that is active in animal cells, but is not utilized in plant cells. This leads to the production of a new polyadenylated mRNA that is nonfunctional. This again shows that analysis of a gene in one cell type cannot predict its behavior in alternative cell types (Wiebauer et al., 1988).
From these examples, it is clear that in natural mRNAs proper polyadenylation is important in mRNA accumulation, and that disruption of this process can effect mRNA levels significantly. However, insufficient knowledge exists to predict the effect of changes in a normal gene. In a heterologous gene, where we do not know if the putative polyA sites (consensus sequences) are functional, it is even harder to predict the consequences. However, it is possible that the putative sites identified are disfunctional. That is, these sites may not act as proper polyA sites, but instead function as aberrant sites that give rise to unstable mRNAs.
In animal cell systems, AATAAA is by far the most common signal identified in mRNAs upstream of the polyA, but at least four variants have also been found (Wickens and Stephenson, 1984). In plants, not nearly so much analysis has been done, but it is clear that multiple sequences similar to AATAAA can be used. The plant sites below called major or minor refer only to the study of Dean et al. (1986) which analyzed only three types of plant gene. The designation of polyadenylation sites as major or minor refers only to the frequency of their occurrence as functional sites in naturally occurring genes that have been analyzed. In the case of plants this is a very limited database. It is hard to predict with any certainty that a site designated major or minor is more or less likely to function partially or completely when found in a heterologous gene such as B.t.
______________________________________
AATAAA Major consensus site P1A AATAAT Major plant site P2A AACCAA Minor plant site P3A ATATAA " P4A AATCAA " P5A ATACTA " P6A ATAAAA " P7A ATGAAA " P8A AAGCAT " P9A ATTAAT " P10A ATACAT " P11A AAAATA " P12A ATTAAA Minor animal site P13A AATTAA " P14A AATACA " P15A CATAAA " ______________________________________
Another type of RNA processing that occurs in the nucleus is intron splicing. Nearly all of the work on intron processing has been done in animal cells, but some data is emerging from plants. Intron processing depends on proper 5' and 3' splice junction sequences. Consensus sequences for these junctions have been derived for both animal and plant mRNAs, but only a few nucleotides are known to be invariant. Therefore, it is hard to predict with any certainty whether a putative splice junction is functional or partially functional based solely on sequence analysis. In particular, the only invariant nucleotides are GT at the 5' end of the intron and AG at the 3' end of the intron. In plants, at every nearby position, either within the intron or in the exon flanking the intron, all four nucleotides can be found, although some positions show some nucleotide preference (Brown, 1986; Hanley and Schuler, 1988).
A plant intron has been moved from a patatin gene into a GUS gene. To do this, site directed mutagenesis was performed to introduce new restriction sites, and this mutagenesis changed several nucleotides in the intron and exon sequences flanking the GT and AG. This intron still functioned properly, indicating the importance of the GT and AG and the flexibility at other nucleotide positons. There are of course many occurences of GT and AG in all genes that do not function as intron splice junctions, so there must be some other sequence or structrual features that identify splice junctions. In plants, one such feature appears to be base composition per se. Wiebauer et al. (1988) and Goodall et al. (1988) have analyzed plant introns and exons and found that exons have .about.50% A+T while introns have .about.70% A+T. Goodall et al. (1988) also created an artificial plant intron that has consensus 5' and 3' splice junctions and a random A+T rich internal sequence. This intron was spliced correctly in plants. When the internal segment was replaced by a G+C rich sequence, splicing efficiency was drastically reduced. These two examples demonsatrate that intron recognition in plants may depend on very general features--splice junctions that have a great deal of sequence diversity and A+T richness of the intron itself. This, of course, makes it difficult to predict from sequence alone whether any particular sequence is likely to function as an active or partially active intron for RNA processing.
B.t. genes being A+T rich contain numerous stretches of various lengths that have 70% or greater A+T. The number of such stretches identified by sequence analysis depends on the length of sequence scanned.
As for polyadenylation described above, there are complications in predicting what sequences might be utilized as splice sites in any given gene. First, many naturally occuring genes have alternative splicing pathways that create alternative combinations of exons in the final mRNA (Gallega and Nadal-Ginard, 1988; Helfman and Ricci, 1988; Tsurushita and Korn, 1989). That is, some splice junctions are apparently recognized under some circumstances or in certain cell types, but not in others. The rules governing this are not understood. In addition, there can be an interaction between processing paths such that utilization of a particular polyadenylation site can interfere with splicing at a nearby splice site and vice versa (Adami and Nevins, 1988; Brady and Wold, 1988; Marzluff and Pandey, 1988). Again no predictive rules are available. Also, sequence changes in a gene can drastically alter the utilization of particular splice junctions. For example, in a bovine growth hormone gene, small deletions in an exon a few hundred bases downstream of an intron cause the splicing efficiency of the intron to drop from greater than 95% to less than 2% (essentially nonfunctional). Other deletions however have essentially no effect (Hampson and Rottman, 1988). Finally, a variety of in vitro and in vivo experiments indicate that mutations that disrupt normal splicing lead to rapid degradation of the RNA in the nucleus. Splicing is a multistep process in the nucleus and mutations in normal splicing can lead to blockades in the process at a variety of steps. Any of these blockades can then lead to an abnormal and unstable RNA. Studies of mutants of normally processed (polyadenylation and splicing) genes are relevant to the study of heterologous genes such as B.t. B.t. genes might contain functional signals that lead to the production of aberrant nonfunctional mRNAs, and these mRNAs are likely to be unstable. But the B.t. genes are perhaps even more likely to contain signals that are analogous to mutant signals in a natural gene. As shown above these mutant signals are very likely to cause defects in the processing pathways whose consequence is to produce unstable mRNAs.
It is not known with any certainty what signals RNA transcription termination in plant or animal cells. Some studies on animal genes that indicate that stretches of sequence rich in T cause termination by calf thymus RNA polymerase II in vitro. These studies have shown that the 3' ends of in vitro terminated transcripts often lie within runs of T such as T5, T6 or T7. Other identified sites have not been composed solely of T, but have had one or more other nucleotides as well. Termination has been found to occur within the sequences TATTTTTT, ATTCTC, TTCTT (Dedrick et al., 1987; Reines et al., 1987). In the case of these latter two, the context in which the sequence is found has been C+T rich as well. It is not known if this is essential. Other studies have implicated stretches of A as potential transcriptional terminators. An interesting example from SV40 illustrates the uncertainty in defining terminators based on sequence alone. One potential terminator in SV40 was identified as being A rich and having a region of dyad symmetry (potential stem-loop) 5' to the A rich stretch. However, a second terminator identified experimentally downstream in the same gene was not A rich and included no potential secondary structure (Kessler et al., 1988). Of course, due to the A+T content of B.t. genes, they are rich in runs of A or T that could act as terminators. The importance of termination to stability of the mRNA is shown by the globin gene example described above. Absence of a normal polyA site leads to a failure in proper termination with a consequent decrease in mRNA.
There is also an effect on mRNA stability due the translation of the mRNA. Premature translational termination in human triose phosphate isomerase leads to instability of the mRNA (Daar et al., 1988). Another example is the beta-thallesemic globin mRNA described above that is specifically unstable in bone marrow cells (Lim et al., 1988). The defect in this mutant gene is a single base pair deletion at codon 44 that leads to translational termination (a nonsense codon) at codon 60. Compared to properly translated normal globin mRNA, this mutant RNA is very unstable. These results indicate that an improperly translated mRNA is unstable. Other work in yeast indicates that proper but poor translation can have an effect on mRNA levels. A heterologous gene was modified to convert certain codons to more yeast preferred codons. An overall 10-fold increase in protein production was achieved, but there was also about a 3-fold increase in mRNA Hoekema et al., 1987). This indicates that more efficient translation can lead to greater mRNA stability, and that the effect of codon usage can be at the RNA level as well as the translational level. It is not clear from codon usage studies which codons lead to poor translation, or how this is coupled to mRNA stability.
Therefore, it is an object of the present invention to provide a method for preparing synthetic plant genes which express their respective proteins at relatively high levels when compared to wild-type genes. It is yet another object of the present invention to provide synthetic plant genes which express the crystal protein toxin of Bacillus thuringiensis at relatively high levels.