Protein synthesis in living cells is defined by 61 codons (genetic code) encoding 20 amino acids with mRNA transcribed from DNA (gene) as a template. However, it is known that the frequencies of codon usage differ among organisms while amino acids are the same. Thus, all codons are not used equally.
The frequencies of codon usage are strongly biased among organisms. Consequently, when a recombinant protein is expressed using a gene derived from other organisms, it has been attempted to choose the preferred codons for the amino acids used in host cells, chemically synthesize a target gene and optimize the expression level of the protein. For example, the gene that is termed a humanized gene was synthesized by considering codon composition used in human cells to express a heterologous protein in a mammalian cell culture system and gene expression has been studied. The frequency of codon usage has been analyzed in detail (see Codon Usage Database at the Kazusa DNA Res. Inst. (KDRI) website), and the data of codon usage frequency in human cells are disclosed to the public (Table 1). The frequency of codon usage for each amino acid is shown in Table 1, which reveals that codon bias exists in each amino acid. In general, a method for synthesis of humanized genes, which designs a target gene with a GC content of 40% to 50%, and codons with less frequency of use are avoided, has been used for closely match the distribution rate of amino acids and codons shown in Table 1.
TABLE 1Codon frequency in human cells1st2nd base3rdbaseTCAGbaseTTTT 0.43PheTCT 0.18SerTAT 0.42TyrTGT 0.42CysTTTC 0.57TCC 0.20TAC 0.58TGC 0.58CTTA 0.06LeuTCA 0.15TAA 0.22StopTGA 0.61StopATTG 0.12TCG 0.06TAG 0.17TGG 1.00TrpGCCTT 0.12CCT 0.29ProCAT 0.41HisCGT 0.09ArgTCTC 0.20CCC 0.33CAC 0.59CGC 0.19CCTA 0.07CCA 0.27CAA 0.27GlnCGA 0.10ACTG 0.43CCG 0.11CAG 0.73CGG 0.19GAATT 0.35IleACT 0.23ThrAAT 0.44AsnAGT 0.14SerTATC 0.52ACC 0.38AAC 0.56AGO 0.25CATA 0.14ACA 0.27AAA 0.40LysAGA 0.21ArgAATG 1.00MetACG 0.12AAG 0.60AGG 0.22GGGTT 0.17ValGCT 0.26AlaGAT 0.44AspGGT 0.18GlyTGTC 0.25GCC 0.40GAC 0.56GGC 0.33CGTA 0.10GCA 0.22GAA 0.41GluGGA 0.26AGTG 0.48GCG 0.10GAG 0.59GGG 0.23G
In addition, humanized genes are designed using various software tools, in which not only codon proportions are considered but also deletions of recognition sites for transcription factors, avoidance of palindrome structures, etc. in nucleic acid sequences, deletions of unnecessary restriction enzyme sites and the like are taken into account. However, a synthetic humanized gene could have many different nuclotide sequences depending on combinations even though the gene encodes for the same amino acid sequence. Consequently, it is not assured to provide an improved synthesis method for the production of a gene product, and not always to express it as high expression level when expressed in mammalian cells. It is the current state that humanized genes are thus synthesized.
On the other hand, the genes for aequorin (189 amino acid residues: Patent Document 1) and clytin II (189 amino acid residues: Patent Document 2), which are heterologous proteins derived from coelenterates and low molecular weight photoproteins with molecular weight of about 20,000, were synthesized using a codon with high frequency of use in human cells and examined the expression in an animal cultured cell system derived from mammal. As a result, these genes showed a higher expression activity than wild-type genes, which are described in Patent Documents 1 and 2. However, “a preferred human codon-optimized gene method,” which involves synthesizing genes by selecting only preferential codons with high frequency of use, has not yet been recognized to date as a general rule. The reason is considered to be because an extreme codon bias for amino acids in a gene sequence will affect the efficiency of protein production in intracellular protein synthesis, judging from the amount of tRNA species for each amino acid in cells and the difference among biological species. Furthermore, the efficiency of protein expression using synthetic genes prepared by selecting and using only preferential codons with high frequency of use has not been verified either for the proteins with normal molecular weight of 30,000 to 60,000. The GC content of genes in eukaryotes including human is approximately 40%. It is unclear if, in general, synthetic genes with the GC content of not less than 60% and preferentially biased codons in usage frequency are efficiently expressed in eukaryotes.