The impact of several successive rare codons such as arginine codons (AGG/AGA; CGA), leucine codon (CTA), isoleucine codon (ATA) and proline codon (CCC), on the level of translation and consecutively on the decrease of the amount and quality of the expressed protein in E. coli are described in Kane J F, Current Opinion in Biotechnology, 6:494-500 (1995). There is a similar impact of individual rare codons if they occur in different parts of the gene.
The GC rich regions also have an impact on the translational efficiency in E. coli if a stable double stranded RNA is formed in the mRNA secondary structure. This impact is the highest when the GC rich regions of mRNA are found either in the RBS, or in the direct proximity of the RBS or also in the direct proximity of the start codon (Makrides S C, Microbiological Reviews, 60:512-538 (1996); Baneyx F, Current Opinion in Biotechnology, 10:411-421 (1999)).
There are known several methods for the prediction of the secondary structure and calculating minimal free energy of individual RNA molecule which is supposed to be the basic rule for the most stable/most probable structure (SantaLucia J Jr and Turner D H, Biopolymers, 44:309-319 (1997)). The reliable algorithms for the prediction of the correct secondary structure are not known, with the exception of some cases. There has been no evidence for the quantitative correlation with the expression level (Smit M H and van Duin J. J. Mol. Biol., 244, 144-150 (1994)). It is still impossible to predict the tertiary structures of RNA (Tinoco, I. and Bustamante C., J. Mol. Biol, 293:271-281 (1999)).
The increase of the expression level after the optimization of DNA sequence in the TIR region, in the RBS region and in the region between the start codon and the RBS region is described in McCarthy J E G and Brimacombe R, Trends Genet 10:402-407 (1994). In this case the expression level increased due to more efficient translation initiation and its smooth continuation in the mRNA coding region.
The production of adequate amounts of hG-CSF for performing the in vitro biological studies by expression in E. coli is described in Souza L M et al, Science 232:61-65 (1986) and in Zsebo K M et al, Immunobiology 172:175-184 (1986). The hG-CSF expression level was lower than 1%.
The U.S. Pat. No. 4,810,643 discloses the use of synthetic gene coding for hG-CSF which was first of all constructed on the basis of replacement of E. coli rare codons with the E. coli preference codons. The combination with thermoinducible phage lambda promoter led to the expression level of 3 to 5% of hG-CSF regarding the total cellular proteins. This level is not sufficient for the economical large-scale production of hG-CSF.
8-10% accumulation of hG-CSF to total cellular proteins was reached by changing the first four codons in the 5′ end region of hG-CSF as is described in Wingfield P et al, Biochem. J, 256:213-218 (1988).
The expression of hG-CSF in E. coli with the yield up to 17% of hG-CSF to total cellular bacterial proteins is described in Devlin P E et al, Gene 65:13-22 (1988). Such yield was reached with partial optimization of DNA sequence in the 5′ end of the G-CSF coding region (codons coding for the first four amino acids) whereby the GC region was replaced with an AT region and a relatively strong lambda phage promoter was used. This expression level is not very high which leads to lower production yields and is less economical in the large-scale production.
The use of a synthetic gene and the expression level of about 30% are described in Kang S H et al, Biotechnology letters, 17(7):687-692 (1995). This level was attained by the introduction of E. coli preference codons, by the modifications in the TIR region, and with the additional modifications of codon sets whereby the 3′ end of the gene was not essentially changed. Thus, for attaining the stated expression level the changes of the gene in the TIR region were needed and the expression level did not exceed 30%.
The U.S. Pat. No. 5,840,543 describes the synthetic gene coding for hG-CSF which was constructed by the introduction of AT rich regions at the 5′ end of the gene and with the replacement of E. coli rare codons with E. coli preference codons. Under the control of the Trp promoter, expression with the yield of 11% hG-CSF to total cellular proteins was reached. On the other hand, the addition of leucine and threonine or their combination into the fermentation medium (where the bacteria were cultivated) led to the accumulation of up to 35% of hG-CSF regarding total cellular proteins. Such expression level was therefore reached by the addition of amino acids into the fermentation medium which is an additional cost in the process for production of hG-CSF and is not economical for the industrial production. Only optimization of the gene coding for hG-CSF did not enable a higher expression level of hG-CSF.
The highest accumulation of hG-CSF regarding total cellular proteins found in the prior art is described in v Jeong et al, Protein Expression and Purification 23,:311-318 (2001) and is 48%. Such accumulation was obtained by the changes in the N-terminal end and by the induction with 1 mM IPTG.
In general, there are no reports on possible predictions of the expression level of native human genes in prokaryotic organisms, e.g. bacterium E. coli. The described expression levels are relatively low or difficult to detect even when the expression plasmids with strong promoters, e.g. from lambda or T7 phage are used. From the prior art literature it can be gathered that many parameters (rare codons or their clustering; GC base pairs rich regions, unfavorable mRNA secondary structures, unstable mRNA) have an impact on the accumulation of a human protein in E. coli. 
Until now, there has been no entirely developed rule known on how to combine codons in order to obtain secondary or tertiary mRNA structures which are optimal for expression. Although there exist some mathematical and structural models for predicting and thermodynamic stability of secondary structures, they are too unreliable to predict the secondary structures. On the other hand, there are no such models for predicting the tertiary structures. These currently accessible models therefore do not enable the prediction of the impact of the codons on the expression level.
There are no reports in either the patent or the scientific literature on the more efficient way for solving the problem of low expression level of the gene coding for hG-CSF in E. coli. 