The present invention relates to improved methods for producing polypeptides. Numerous approaches have been applied in generating strains for protein over-expression and/or production. This includes, but is not limited to, making strains with multi-copies of the gene encoding the protein of interest (POI) and applying strong promoter sequences.
Each specific amino acid is encoded by a minimum of one codon and a maximum of six codons. Prior research has shown that codon usage in genes encoding the cell's polypeptides is biased among species (Kanaya, S, Y. Yamada, Y. Kudo and T. Ikemura (1999) Studies of codon usage and tRNA genes at 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 238:143-155). Prior publications disclose optimization of codon use in a given host cell to improve polypeptide production (as example see WO 97/11086). More specifically, WO 03/70957 describes optimized codon use in filamentous fungi for producing plant polypeptides. In all these cases of ‘classic’ codon optimization, a native codon has been substituted by the most frequent codon from a reference set of genes, whereas the rate of codon translation for each amino acid is designed to be high (optimized).
More recently, in WO 03/85114 a harmonization of codon use was described which takes into effect the distribution of all codons in genes of the host organism, assuming that these effect protein folding.
The availability of fully sequenced genomes of many organisms in recent years, e.g. Bacillus subtilis (Kunst et al. 1997), Bacillus amiloliquefaciens, Aspergillus niger (Pel et al., 2007, Nat Biotech. 25: 221-231), Kluyveromyces lactis, Saccharomyces cerevisiae, various plant genomes, mouse, rat and human, has offered the possibility of analyzing different aspects of the gene sequences themselves in relation to their natural expression level (mRNA or protein level). A good example is codon usage (bias) analysis, and subsequent single-codon optimization. Note that single-codon optimization is herein understood to refer to codon optimization or codon harmonization techniques that focus on the optimization of codons as single independent entities, in contrast to codon-pair optimization, which is the topic of the current invention.
Whereas single-codon usage (bias) has been studied extensively before (for an overview, see Gustafsson et al., 2004, Trends Biotechnol. 22:346-353), there are only a few reports on codon pair usage and for optimization of codon-pairs.
The effect of a few specific codon-pairs on ribosomal frameshifts in E. coli has e.g. been investigated for the AGG-AGG codon-pair (Spanjaard and van Duin, 1988, Proc. Natl. Acad. Sci. USA 85:7967-7971; Gurvich et al., 2005, J. Bacteriol. 187:4023-432), and for UUU-YNN sites (Schwarz and Curran, 1997, Nucleic Acids Res. 25:2005-2011).
Gutman and Hatfield (1989, Proc. Natl. Acad. Sci. USA 86:3699-3703) analyzed a larger set of sequences for all possible codon pairs for E. coli and found that codon pairs are directionally biased. In addition, they observed that highly underrepresented pairs are used almost used twice as frequently as overrepresented ones in highly expressed genes, whereas in poorly expressed genes overrepresented pairs are used more frequently. U.S. Pat. No. 5,082,767 (Hatfield and Gutman, 1992) discloses a method for determining relative native codon pairing preferences in an organism and altering codon pairing of a gene of interest in accordance with said codon pairing preferences to change the translational kinetics of said gene in a predetermined manner, with examples for E. coli and S. cerevisiae. However, in their method, Hatfield and Gutman only optimize individual pairs of adjacent codons. Moreover, in their patent (U.S. Pat. No. 5,082,767), it is claimed to increase translational kinetics of at least a portion of a gene by a modified sequence in which codon pairing is altered to increase the number of codon pairs that, in comparison to random codon pair usage, are the more abundant and yet more under-represented codon pairs in a organism. The present invention discloses a method to increase translation by a modified sequence in which codon pairing is altered to increase the number of codon pairs that, in comparison to random codon pair usage, are the more over-represented codon pairs in an organism.
Moura et al. (2005, Genome Biology, 6:R28) analyzed the entire S. cerevisae ORFeome but did not find a statistically significant bias for about 47% of the codon pairs. The respective values differed from one species to another, resulting in “codon context maps” that can be regarded as “species-specific fingerprints” of the codon pair usage.
Boycheva et al. (2003, Bioinformatics 19(8):987-998) identified two sets of codon pairs in E. coli referred to as hypothetically attenuating and hypothetically non-attenuating by looking for over- and under-represented codon pairs among genes with high and poor expression. However, they do not propose a method to apply this finding, nor gave any experimental prove for their hypothesis. Note that these groups are defined completely opposite to the ones defined by Gutman and Hatfield (1989, 1992, supra), who proposed a non-attenuating effect for highly underrepresented pairs in highly expressed genes.
Buchan, Aucott and Stanfield (2006, Nucleic Acids Research 34(3):1015-1027) analyzed tRNA properties with respect to codon pair bias.
As for the implications of biases in codon pair utilization, Irwin et al. (1995, J. Biol. Chem. 270:22801-22806) demonstrated in E. coli that the rate of synthesis actually decreased substantially when replacing a highly underrepresented codon pair by a highly overrepresented one and increased when exchanging a slightly underrepresented codon pair for a more highly underrepresented. This is quite remarkable as it is rather the opposite of what one would expect given the influence of single codon bias on protein levels.
However, none of the above-cited art discloses how to optimize the codon-pair usage of a full-length codon sequence taking account of the fact that by definition codon pairs overlap and that therefore optimization of each individual codon pair affects the bias of the overlapping up- and downstream codon pairs. Moreover, none of the cited art discloses a method that combines optimization of both single codons as well as codon pairs. Codon pair optimization taking into account said codon pair overlapping and optional combination of said codon-pair optimization with single-codon optimization would greatly improve expression of the nucleotide sequence encoding the polypeptide of interest and/or improve production of said polypeptide.
There is thus still a need in the art for novel methods for optimization of coding sequences for improving the production a polypeptide in a host cell.