The expression of foreign heterologous genes in transformed cells is now commonplace. A large number of mammalian genes, including, for example, murine and human genes, have been successfully expressed in various host cells, including bacterial, yeast, insect, plant and mammalian host cells. Nevertheless, despite the burgeoning knowledge of expression systems and recombinant DNA technology, significant obstacles remain when one attempts to express a foreign or synthetic gene in a selected host cell. For example, translation of a synthetic gene, even when coupled with a strong promoter, often proceeds much more slowly than would be expected. The same is frequently true of exogenous genes that are foreign to the host cell. This lower than expected translation efficiency is often due to the protein coding regions of the gene having a codon usage pattern that does not resemble those of highly expressed genes in the host cell. It is known in this regard that codon utilization is highly biased and varies considerably in different organisms and that biases in codon usage can alter peptide elongation rates. It is also known that codon usage patterns are related to the relative abundance of tRNA isoacceptors, and that genes encoding proteins of high versus low abundance show differences in their codon preferences.
The implications of codon preference phenomena on gene expression are manifest in that these phenomena can affect the translational efficiency of messenger RNA (mRNA). It is widely known in this regard that translation of “rare codons”, for which the corresponding iso-tRNA is in low abundance relative to other iso-tRNAs, may cause a ribosome to pause during translation which can lead to a failure to complete a nascent polypeptide chain and an uncoupling of transcription and translation. Thus, the expression of an exogenous gene may be impeded severely if a particular host cell of an organism or the organism itself has a low abundance of iso-tRNAs corresponding to one or more codons of the exogenous gene. Accordingly, a major aim of investigators in this field is to first ascertain the codon preference for particular cells in which an exogenous gene is to be expressed, and to subsequently alter the codon composition of that gene for optimized expression in those cells.
Codon-optimization techniques are known for improving the translational kinetics of translationally inefficient protein coding regions. Traditionally, these techniques have been based on the replacement of codons that are rarely or infrequently used in the host cell with those that are host-preferred. Codon frequencies can be derived from literature sources for the highly expressed genes of many organisms (see, for example, Nakamura et al., 1996, Nucleic Acids Res 24: 214-215). These frequencies are generally expressed on an ‘organism-wide average basis’ as the percentage of occasions that a synonymous codon is used to encode a corresponding amino acid across a collection of protein-encoding genes of that organism, which are preferably highly expressed.
Typically, codons are classified as: (a) “common” codons (or “preferred” codons) if their frequency of usage is above about 4/3× the frequency of usage that would be expected in the absence of any bias in codon usage; (b) “rare” codons (or “non-preferred” codons) if their frequency of usage is below about 2/3× the frequency of usage that would be expected in the absence of any bias in codon usage; and (c) “intermediate” codons (or “less preferred” codons) if their frequency of usage is in-between the frequency of usage of “common” codons and of “rare” codons. Since an amino acid can be encoded by 2, 3, 4 or 6 codons, the frequency of usage of any selected codon, which would be expected in the absence of any bias in codon usage, will be dependent upon the number of synonymous codons which code for the same amino acid as the selected codon. Accordingly, for a particular amino acid, the frequency thresholds for classifying codons in the “common”, “intermediate” and “rare” categories will be dependent upon the number of synonymous codons for that amino acid. Consequently, for amino acids having 6 choices of synonymous codon, the frequency of codon usage that would be expected in the absence of any bias in codon usage is 16% and thus the “common”, “intermediate” and “rare” codons are defined as those codons that have a frequency of usage above 20%, between 10 and 20% and below 10%, respectively. For amino acids having 4 choices of synonymous codon, the frequency of codon usage that would be expected in the absence of codon usage bias is 25% and thus the “common”, “intermediate” and “rare” codons are defined as those codons that have a frequency of usage above 33%, between 16 and 33% and below 16%, respectively. For isoleucine, which is the only amino acid having 3 choices of synonymous codon, the frequency of codon usage that would be expected in the absence of any bias in codon usage is 33% and thus the “common”, “intermediate” and “rare” codons for isoleucine are defined as those codons that have a frequency of usage above 45%, between 20 and 45% and below 20%, respectively. For amino acids having 2 choices of synonymous codon, the frequency of codon usage that would be expected in the absence of codon usage bias is 50% and thus the “common”, “intermediate” and “rare” codons are defined as those codons that have a frequency of usage above 60%, between 30 and 60% and below 30%, respectively. Thus, the categorization of codons into the “common”, “intermediate” and “rare” classes (or “preferred”, “less preferred” or “non preferred”, respectively) has been based conventionally on a compilation of codon usage for an organism in general (e.g., ‘human-wide’) or for a class of organisms in general (e.g., ‘mammal-wide’). For example, reference may be made to Seed (see U.S. Pat. Nos. 5,786,464 and 5,795,737) who discloses preferred, less preferred and non-preferred codons for mammalian cells in general. However, the present inventor revealed in WO 99/02694 and in WO 00/42190 that there are substantial differences in the relative abundance of particular iso-tRNAs in different cells or tissues of a single multicellular organism (e.g., a mammal or a plant) and that this plays a pivotal role in protein translation from a coding sequence with a given codon usage or composition.
Thus, in contrast to the art-recognized presumption that different cells of a multicellular organism have the same bias in codon usage, it was revealed for the first time that one cell type of a multicellular organism uses codons in a manner distinct from another cell type of the same organism. In other words, it was discovered that different cells of an organism can exhibit different translational efficiencies for the same codon and that it was not possible to predict which codons would be preferred, less preferred or non preferred in a selected cell type. Accordingly, it was proposed that differences in codon translational efficiency between cell types could be exploited, together with codon composition of a gene, to regulate the production of a protein in, or to direct that production to, a chosen cell type.
Therefore, in order to optimize the expression of a protein-encoding polynucleotide in a particular cell type, WO 99/02694 and in WO 00/42190 teach that it is necessary to first determine the translational efficiency for each codon in that cell type, rather than to rely on codon frequencies calculated on an organism-wide average basis, and then to codon modify the polynucleotide based on that determination.
The present inventor further disclosed in WO 2004/042059 a strategy for enhancing or reducing the quality of a selected phenotype that is displayed, or proposed to be displayed, by an organism of interest. The strategy involves codon modification of a polynucleotide that encodes a phenotype-associated polypeptide that either by itself, or in association with other molecules, in the organism of interest imparts or confers the selected phenotype upon the organism. Unlike previous methods, however, this strategy does not rely on data that provide a ranking of synonymous codons according to their preference of usage in an organism or class of organisms. Nor does it rely on data that provide a ranking of synonymous codons according to their translational efficiencies in one or more cells of the organism or class of organisms. Instead, it relies on ranking individual synonymous codons that code for an amino acid in the phenotype-associated polypeptide according to their preference of usage by the organism or class of organisms, or by a part thereof, for producing the selected phenotype.