This invention relates to increasing the expression of heterologous or chimeric proteins in plants, and more specifically to increasing protein expression in Arabidopsis species and other dicotyledons.
One of the primary goals of plant genetic research and development is the production of transgenic plants that express a heterologous gene (i.e., produce a xe2x80x9cforeignxe2x80x9d protein) in an amount sufficient to confer a desired phenotype to the plant. While significant advances have been made in pursuit of this goal, the expression of certain heterologous genes in transgenic plants remains problematic. It is thought that numerous factors are involved in determining the ultimate level of expression of a heterologous gene in a plant. The amount of protein that is synthesized from a gene is a function of several complex and interrelated events, including transcription, RNA maturation, translation, and post-translational modification. Each of these processes is comprised of a large number of events, all of which are potentially regulated either independently or in concert.
The genetic code is considered xe2x80x9cdegeneratexe2x80x9d in that more than one nucleic acid triplet (i.e., a xe2x80x9ccodonxe2x80x9d) encodes the same amino acid. Many amino acids may be coded for by several different codons. In general, genes within a taxonomic group exhibit similarities in codon choice, regardless of the function of these genes. Thus, an estimate of the overall use of the genetic ode by a taxonomic group can be obtained by summing codon frequencies of all its sequenced genes. Variation between degenerate base frequencies is not a neutral phenomenon, since systematic codon preferences have been reported for bacterial, yeast, plant and mammalian genes. Bias in codon choice within genes in a single species appears related to the level of expression of protein encoded by any particular gene.
Codon bias is most extreme in highly expressed proteins of bacteria (e.g., E. coli) and yeast. In unicellular organisms, highly expressed genes use a smaller subset of codons than do weakly expressed genes, although the codons preferred are distinct in some cases. Sharp and Li, Nucl. Acids Res. 14, 7734-7749 (1986), report that codon usage in 165 E. coli genes reveals a positive correlation between high expression and increased codon bias. In these organisms, a strong positive correlation has been reported between the abundance of an isoaccepting tRNA species and the favored synonymous codon. For example, in one group of highly expressed proteins in yeast, over 96% of the amino acids are encoded by only 25 of the 61 available codons. See Bennetzen and Hall, J. Biol. Chem. 257, 3026-3031. (1982). These 25 codons are preferred in all sequenced yeast genes, but the degree of preference varies with the level of expression of the genes. Biased codon choice in highly expressed genes appears to enhance translation, and is required for maintaining mRNA stability in yeast. It has been proposed that the good fit of abundant yeast and E. coli mRNA codon usage to isoacceptor tRNA abundance promotes high translation levels and high steady state levels of these proteins. These results strongly suggest that the potential for high levels of expression of plant genes in yeast or E. coli is limited by their codon usage, and conversely that high levels of expression of E. Coli or yeast genes in plant cells is similarly limited by the preferred codon usage of the different organisms.
Although plant codon usage patterns are distinct from those reported for bacteria, yeast, and animals, in general plant codon usage pattern more closely resembles that of higher eukaryotes than unicellular organisms, due to the overall preference for G+ C content in codon position III. Moreover, analysis of a large group of plant gene sequences indicates that synonymous codons are used differently by monocots and dicots. Wilbur et al., Plant Physiol. 92: 1-11 (1990), describes the difference in codon usage between bacteria and higher plants such as dicotyledonous and monocotyledonous plants. For example, the codon usages for codons XCG and XUA are 1.8% and 3.2% in dicotyledonous plants and 6.3% and 1.4% in monocotyledonous plants. The combined codon usage for codons XXC and XXG (hereinafter, referred to as the codon XXC/G usage, wherein each of the two Xs is independently selected from the group consisting of A, G, C and T) is 45% in dicotyledon and 73.5% in monocotyledon. It is well established that GC content in genes which can be translated is higher in monocotyledon such as gramineous plants, e.g., rice plants, than in dicotyledons. As to bacteria, the codon usage apparently varies by strain.
In this regard, investigators have determined that typical plant structural coding sequences preferentially utilize certain codons to encode certain amino acids in a different frequency than the frequency of usage appearing in bacterial or other non-plant coding sequences. Thus, it has been suggested that the differences between the typical codon usage present in plant coding sequences as compared to the typical codon usage present in non-plant coding sequence is a factor contributing to the low levels of non-plant mRNA and non-plant protein produced in transgenic plants. These differences in codon usage may contribute to the low levels of mRNA or protein expressed by the non-plant coding sequence in a transgenic plant by affecting the transcription or translation of the coding sequence or proper MRNA processing.
Recently, attempts have been made to alter the structural coding sequence of a desired polypeptide or protein in an effort to enhance its expression in the plant. In particular, investigators have altered the codon usage of heterologous, structural coding sequences (i.e., heterologous genes) in an attempt to enhance their expression in plants. Most notably, the sequence encoding insecticidal crystal proteins of Bacillus thuringiensis (Bt) has been modified in various ways to enhance its expression in a plant, particularly monocotyledonous plants, to produce commercially viable insect-tolerant plants.
U.S. Pat. No. 5,380,831 to Adang et al. describes synthetic Bt genes designed to be expressed at a level higher than naturally-occurring Bt genes. The genes utilize codons preferred in highly expressed monocot or dicot protein. Specifically, the synthetic genes, while about 85% homologous to the native bacterial sequence, are chemically modified to contain codons that are preferred by highly expressed plant genes, and to eliminate undesirable sequences that cause destabilization, termination of RNA, secondary structures and RNA splice sites.
U.S. Pat. No. 5,436,391 to Fujimoto et al. describes a synthetic gene encoding the insecticidal protein Bt. The gene is provided having a base sequence which has been modified to bring the codon usage in conformity with the genes of graminaceous plants, particularly rice plants (e.g., oryza).
U.S. Pat. No. 5,689,052 to Brown et al. describes a method for modifying a foreign nucleotide sequence for enhanced accumulation of its protein product in a monocotyledonous plant, and/or increasing the frequency of obtaining transgenic monocotyledonous plants which accumulate useful amounts of a transgenic protein, by reducing the frequency of the rare and semi-rare monocotyledonous codons in the foreign gene and replacing them with more preferred monocotyledonous codons.
Another approach to altering the codon usage of a Bt toxin gene to enhance its expression in plants is described in U.S. Pat. No. 5,500,365 to Fischhoff et al. Here, the synthetic plant gene was prepared by modifying the coding sequence to remove all ATTTA sequences and certain identified putative polyadenylation signals. Moreover, the gene sequence was scanned to identify regions with greater than four consecutive adenine or thymine nucleotides. If there were more than one of the minor polyadenylation signals identified within ten nucleotides of each other, then the nucleotide sequence of this region was altered to remove these signals while maintaining the original encoded amino acid sequence. The overall G+ C content was also adjusted to provide a final sequence having a G+ C ratio of about 50%. Similarly, U.S. Pat. No. 5,877,306 Cornelissen et al. discloses a method of modifying a DNA sequence encoding a Bt crystal protein toxin wherein the gene was modified by reducing the A+ T content. This was accomplished by changing the adenine and thymine bases to cytosine and guanine, while maintaining a coding sequence for the original protein toxin.
While the foregoing examples emphasize the modification or optimization of codon usage in heterologous structural genes (i.e., the genes encoding a desired protein product, such as Bt toxin), the modification of regulatory elements that control transcription by optimizing codon usage in the host plant has not been emphasized. It is widely recognized that the upstream regulatory elements that control transcription and translation have very significant roles in determining the quantity, timing, and tissue specificity of gene expression. Various nucleotide sequences other than the heterologous structural coding sequence affect the expression levels of a foreign DNA sequence introduced into a plant, including promoter sequences, intron sequences, 3xe2x80x2 untranslated sequences, polyadenylation sites, and other regulatory sequences.
In view of the foregoing, activation and control of transcription are processes that may desirably be manipulated in order to achieve altered (i.e., increased or decreased) expression of a heterologous structural gene in a plant cell. Transcription can be activated through the use of two functional domains of a transcription activation moiety: a domain (i.e., sequence of amino acids) that recognizes and binds to a specific site or sequence of nucleotides on a target DNA, (the DNA binding domain); and a domain that is capable of activating transcription of the DNA when physically associated with the DNA-binding domain and which may be necessary for activation of the target gene (the activation domain). See Keegan, et al., Science 231, 669-704 (1986); Ma and Ptashne, Cell 48, 847-853 (1987). The two functional domains may be derived from a single transcription activation protein. Alternatively, it has been shown that these two functions can also reside on separate proteins. See McKnight et al., Proc. Natl. Acad. Sci. USA 89, 7061-7065 (1987); Curran et al. 55, 395-397 (1988). The transcription activation domains may also be derived from synthetic DNA-binding and transcription activation proteins.
Additional flexibility in controlling heterologous gene expression in plants may be obtained by using DNA binding domains and response elements from heterologous sources (i.e., DNA binding domains from non-plant sources). Some examples of such heterologous DNA binding domains include the LexA and GAL4 DNA binding domains. The LexA DNA-binding domain is part of the repressor protein LexA from Escherichia coli (E. Coli) (Brent and Ptashne, Cell 43:729-736 (1985)).
Although the LexA DNA binding domain functions as an efficient DNA binding domain in its natural bacterial host, when transferred by recombinant DNA technology into higher eukaryotes such as plants, the domain is not efficiently expressed. Accordingly, it would be desirable to alter the LexA DNA binding domain to increase its expression in higher eukaryotes such as plants.
Certain objects, advantages and novel features of the invention will be set forth in the description that follows, and will become apparent to those skilled in the art upon examination of the following, or may be learned with the practice of the invention.
It is an object of the invention to provide a synthetic LexA DNA binding domain optimized for codon usage in plants, more specifically in dicots, and most specifically in Arabidopsis thaliana. 
The invention relates to adapting the codons of the DNA binding domain of the LexA gene from E. Coli to the codon usage of Arabidopsis thaliana. This method is advantageous in that it allows for the increased expression of heterologous and chimeric proteins containing this artificial DNA binding domain.
Additional aspects of this invention include constructs (i.e., vectors, DNA fusions and polynucleotides), comprising the synthetic DNA sequence of the present invention. These constructs are useful for increasing heterologous protein expression in plant cells. Further aspects of the invention are cells, plant lines, and transgenic plants transformed with the described constructs. Methods of increasing expression of heterologous proteins in a cell or a transgenic plant are an additional aspect of the present invention.
The foregoing and other aspects of the present invention are explained in detail in the specification set forth below.