The present invention relates to the recoding of DNA sequences which encode proteins which contain regions having a high content of codons which are poorly translated by yeasts, in particular which encode proteins of plant origin, such as the P450 cytochromes of plant origin, and to their expression in yeasts.
It is known that certain sequences encoding proteins of interest, in particular proteins of plant origin, are not readily translated in yeasts. This applies, in particular, to proteins which possess regions having a high content of codons which are poorly suited to yeasts, in particular leucine codons, such as some P450 cytochromes of plant origin. Some systems which have been developed for improving the expression of P450 cytochromes of animal or plant origin in yeasts, such as those described by Pompon et al. (Methods Enzymol., 272, 1996, 51-64; WO 97/10344), have turned out to be unsuitable for large numbers of P450 cytochromes which encompass regions having a high content of codons which are poorly suited to yeasts.
The P450 cytochromes constitute a superfamily of membrane enzymes of the monooxygenase type which are able to oxidize a large family of generally hydrophobic substrates. The reactions are most frequently characterized by the oxidation of Cxe2x80x94H or Cxe2x95x90C bonds, and of heteroatoms, and, more rarely, by the reduction of nitro groups or by dehalogenation. More specifically, these enzymes are involved in the metabolism of xenobiotic substances and drugs and in the biosynthesis of secondary metabolites in plants, some of which have organoleptic or pharmacodynamic properties.
As a consequence, the P450 cytochromes are used, in particular, in:
the in vitro diagnosis of the formation of toxic or mutagenic metabolites (molecules of natural origin, pollutants, drugs, pesticides, etc.), making it possible, in particular, to develop novel active molecules (pharmaceutical, agrochemistry),
the identification and destruction of molecules which are toxic for, or pollute, the environment,
the enzymic synthesis of novel molecules.
The search for heterologous expression of P450 cytochromes by host cells, more specifically yeasts, is therefore important for obtaining controlled production of this enzyme in large quantity, either for isolating it and using it in the above-listed processes, or for using the transformed cells directly for the said processes without previously isolating the enzyme.
The present invention provides a solution to the abovementioned problem, enabling proteins which contain regions having a high content of codons which are poorly suited to yeasts, in particular P450 cytochromes of plant origin, to be expressed in yeasts.
The present invention therefore relates to a DNA sequence, in particular a cDNA sequence, which encodes a protein of interest which contains regions having a high content of codons which are poorly suited to yeasts, characterized in that a sufficient number of codons which are poorly suited to yeasts is replaced with corresponding codons which are well-suited to yeasts in the said regions having a high content of codons which are poorly suited to yeasts.
Within the meaning of the present invention, xe2x80x9ccodons which are poorly suited to yeastsxe2x80x9d are understood as being codons whose frequency of use by yeasts is less than or equal to approximately 13 per 1000, preferably less than or equal to approximately 12 per 1000, more preferably less than or equal to approximately 10 per 1000. The frequency at which codons are used by yeasts, more specifically by S. cerevisiae, is described, in particular, in xe2x80x9cCodon usage databasexe2x80x9d by Yasukazu Nakamura (available on the Kazusa world wide web server). This applies, in particular, to codons CTC, CTG and CTT, which encode leucine, to codons CGG, CGC, CGA, CGT and AGG, which encode arginine, to codons GCG and GCC, which encode alanine, to codons GGG, GGC and GGA, which encode glycine, and to codons CCG and CCC, which encode proline. The codons which are poorly suited to yeasts in accordance with the invention are, more specifically, codons CTC and CTG, which encode leucine, CGG, CGC, CGA, CGT and AGG, which encode arginine, codons GCG and GCC, which encode alanine, GGG and GGC, which encode glycine, and codons CCG and CCC, which encode proline.
Within the meaning of the present invention, xe2x80x9ccorresponding codons which are well-suited to yeastsxe2x80x9d are understood as being the codons which correspond to the codons which are poorly suited to yeasts and which encode the same amino acids, and whose frequency of use by yeasts is greater than 15 per 1000, preferably greater than or equal to 18 per 1000, more preferably greater than or equal to 20 per 1000. This applies, in particular, to codons TTG and TTA, preferably TTG, which encode leucine, to codon AGA, which encodes arginine, to codons GCT and GCA, preferably GCT, which encode alanine, to codon GGT, which encodes glycine, and to codon CCA, which encodes proline.
Within the meaning of the present invention, xe2x80x9cregion having a high content of codons which are poorly suited to yeastsxe2x80x9d is understood as being any region of the DNA sequence which contains at least 2 poorly suited codons among 10 consecutive codons, with it being possible for the two codons to be adjacent or separated by up to B other codons. According to one preferred embodiment of the invention, the regions having a high content of poorly suited codons contain 2, 3, 4, 5 or 6 poorly suited codons per 10 consecutive codons, or contain at least 2 or 3 adjacent poorly suited codons.
Within the meaning of the present invention, xe2x80x9csufficient number of codonsxe2x80x9d is understood as being the number of codons which it is necessary and sufficient to replace in order to observe a substantial improvement in their expression in yeasts. Advantageously, at least 50% of the codons which are poorly suited to yeasts in the high-content region under consideration are replaced with well-suited codons. Preferably, at least 75% of the poorly suited codons of the said region are replaced, with 100% of the poorly suited codons more preferably being replaced.
Within the meaning of the present invention, xe2x80x9csubstantial improvementxe2x80x9d is understood as being either a detectable expression when no expression of the reference sequence is observed, or an increase in expression as compared with the level at which the reference sequence is expressed.
Within the meaning of the present invention, xe2x80x9creference sequencexe2x80x9d designates any sequence which encodes a protein of interest and which is modified in accordance with the invention in order to promote its expression in yeasts.
The present invention is particularly well suited to DNA sequences, in particular cDNA sequences, which encode proteins of interest which contain regions having a high content of leucine and in which a sufficient number of CTC codons encoding leucine in the said region having a high content of leucine is replaced with TTG and/or TTA codons, or in which a sufficient number of CTC and CTG codons encoding leucine in the said region having a high content of leucine is replaced with TTG and/or TTA codons, preferably with a TTG codon.
Within the meaning of the present invention, xe2x80x9cregion having a high content of leucinexe2x80x9d is understood w as being a region which contains at least 2 leucines among 10 consecutive amino acids in the protein of interest, with it being possible for the two leucines to be adjacent or separated by up to 8 other amino acids. According to one preferred embodiment of the invention, the regions having a high content of leucine contain 2, 3, 4, 5 or 6 leucines per 10 consecutive amino acids, or contain at least 2 or 3 adjacent leucines.
According to a preferred embodiment of the invention, at least 50% of the CTC or CTC and CTG codons of the region having a high content of leucine are replaced with TTG or TTA codons, with at least 75% of the CTC or CTC and CTG codons of the said region preferably being replaced, and 100% of the CTC or CTC and CTG codons more preferably being replaced.
Advantageously, the present invention is particularly suitable for DNA sequences whose general content of poorly suited codons is at least 20%, more preferably at least 30%, as compared with the total number of codons in the reference sequence.
Advantageously, when the reference sequence contains at least one 5 xe2x80x2 region having a high content of poorly suited codons, the recoding of this 5xe2x80x2 region alone makes it possible to obtain a substantial improvement in the expression of the protein of interest in yeasts. The length of the 5xe2x80x2 region to be recoded in accordance with the invention will vary depending on the length of the region having a high content of poorly suited codons. This length will advantageously be at least four codons, in particular when this region contains at least two adjacent poor codons, up to approximately 40 codons or more.
However, it is not necessary, according to the invention, to recode all the reference sequence, but only the regions having a high content of poor codons, in particular the 5xe2x80x2 region on its own, in order to obtain a substantial improvement in the expression of the protein of interest in yeasts.
Advantageously, the DNA sequence encoding a protein of interest is an isolated DNA sequence of natural origin, in particular of plant origin. The invention is particularly advantageous for sequences which originate from monocotyledonous or dicotyledonous plants, preferably monocotyledonous plants, in particular of the graminae family, such as wheat, barley, oats, rice, maize, sorghum, cane sugar, etc.
According to a preferred embodiment of the invention, the DNA sequence encodes an enzyme, in particular a cytochrome P450, which is preferably of plant origin. These P450 cytochromes exhibit a high content of poorly suited codons, in particular encoding leucine, in their N-terminal region; it is in the 5xe2x80x2 terminal coding region that the poorly suited codons are replaced.
The present invention also relates to a chimeric gene which comprises a DNA sequence which has been modified as above and heterologous 5xe2x80x2 and 3xe2x80x20 regulatory elements which are able to function in a yeast, that is to say which are able to control the expression of the protein of interest in the yeast. Such regulatory elements are well known to the skilled person and are described, in particular, by Rozman et al. (Genomics, 38, 1996, 371-381) and by Nacken et al. (Gene, 175, 1996, 253-260, Probing the limits of expression levels by varying promoter strength and plasmid copy number in Saccharomyces cerevisiae).
The present invention also relates to a vector for transforming yeasts which contains at least one chimeric gene as described above. It also relates to a process for transforming yeasts with the said vector and to the transformed yeasts which are obtained. It finally relates to a process for producing a heterologous protein of interest in a transformed yeast, with the sequence which encodes the said protein of interest being such as defined above.
The process for producing a heterologous protein of interest in a transformed yeast comprises the steps of:
a) transforming a yeast with a vector which is able to replicate in yeasts and which contains a modified DNA sequence as defined above and heterologous 5xe2x80x2 and 3xe2x80x2 regulatory elements which are able to function in a yeast,
b) culturing the transformed yeast, and
c) extracting the protein of interest from the yeast culture.
When the protein of interest is an enzyme which is suitable for transforming a substrate, such as a cytochrome P450, the enzyme which has been extracted from the yeast culture is then used for catalysing the transformation of the said substrate.
However, the catalysis can be carried out, without requiring the extraction of the yeast, by culturing the transformed yeast in the presence of the said substrate.
The present invention also relates, therefore, to a process for transforming a substrate by enzymic catalysis using an enzyme which is expressed in a yeast, which process comprises the steps of
a) culturing the yeast which has been transformed in accordance with the invention in the presence of the substrate to be transformed, then
b) recovering the transformed substrate from the yeast culture.
When the yeast has been transformed for expressing a cytochrome P450, the reaction which is catalysed by the enzyme is an oxidation reaction, more specifically a reaction in which Cxe2x80x94H or Cxe2x95x90C bonds are oxidized.
The techniques for transforming and culturing yeasts are known to the skilled person, and are described, for example, in Methods in Enzymology (Vol. 194, 1991).
Yeasts which are of use in accordance with the invention are selected, in particular, from the genera Saccharomyces, Kluyveromyces, Hansenula, Pichia and Yarrowia. Advantageously, the yeast belongs to the Saccaromyces genus, and is in particular S. cerevisiae.