An increasing number of naturally existing genes have been redesigned at the nucleotide level and synthesized in attempts to improve protein yields (Wu, G., Zheng, Y., Qureshi, I., Zin, H. T., Beck, T., Bulka, B., Freeland, S. J. (2006) SGDB: a database of synthetic genes re-designed for optimizing protein over-expression. Nucleic Acids Research, 2006, vol. 00, Database issue D1-D4 (Nucleic Acids Research Advance Access published Nov. 7, 2006)).
Basically all methods for design of synthetic genes rely on some sort of codon optimization, which requires an understanding of what an optimal codon is, usually based on the general idea that translationally optimal codons are those which match the most abundant tRNA species in a host cell.
A number of different companies today offer synthetic gene synthesis, where they frequently contribute with one or more other ideas about gene optimization, such as mRNA structure, mRNA stability, or codon context.
Codon usage in a host cell relative to the abundance of tRNA species therein is one of the clearly distinguishing sequence features that contribute to different levels of gene expression. However, in a database of synthetic genes from published, peer-reviewed studies, more than 20% of the genes did not increase protein yield after supposed codon optimization. It has been said that this figure of 20% negative optimization results is probably an underestimate of the problems in synthetic gene design, as peer-reviewed publications are likely to be biased towards reports of success (Wu et al., 2006; see above).
It has also been stated that a weakness of the current synthetic gene database is the lack of studies, where multiple versions of a synthetic gene have been designed to directly compare algorithms for codon optimization, and the paper advocates for research providing data of multiple variations in coding strategy for a single protein product (Wu et al., 2006; see above).
The term “translational coupling” describes the observation, that translational initiation can be affected by translational termination events close to the initiation codon, in particular the observation that translation termination enhances translation initiation of closely associated genes (Sprengel, R., Reiss, B., Schaller, H. (1985). Translationally coupled initiation of protein synthesis in Bacillus subtilis. Nucleic Acids Research, 13, 893-909, and references therein).
Translational coupling can be made solely responsible for translation of a specific open reading frame, as described in the study by Sprengel et al., 1985. They constructed various combinations of the B. licheniformis penP gene and the Tn5 neo gene, and investigated the expression of these constructs in B. subtilis. The Tn5 neo gene, in the absence of translational coupling, was not expressed in B. subtilis. The study found that different amounts of neo gene product were produced from constructs that differed in the distance between the penP termination and the neo initiation codon, and found that activation of neo gene expression was highest if the neo ATG start codon overlapped with the TGA termination codon of the penP reading unit in the sequence ATGA. The study concluded that sequences immediately in front of the initiation codon of the translationally coupled gene are not involved in the initiation reaction following translational termination. The study estimated the frequency of ribosomes reinitiating at the neo initiation codon overlapping with the termination codon in the sequence ATGA to 5-10%, for some specific constructs.
The Sprengel study further used the translational coupling phenomenon as a tool to investigate the expression and regulation of the B. licheniformis penP gene in B. licheniformis itself, and mentions its use for stabilizing highly instable penP gene derivatives.
Another publication (Zaghloul, T. I., Kawamura, F., Doi, R. H. (1985). Translational coupling in Bacillus subtilis of a heterologous Bacillus subtilis-Escherichia coli gene fusion. Journal of Bacteriology, 164, 550-555) reports that expression of a Tn9-derived cat gene in B. subtilis was dependent on a gene fusion, wherein translation initiated at the B. subtilis aprA gene and terminating within the cat ribosome binding site was responsible for translation initiation of the cat gene.
Peijnenburg et al. (Peijnenburg, A. A. C. M., Venema, G., Bron, S. (1990). Translational coupling in a penP-lacZ gene fusion in Bacillus subtilis and Escherichia coli. Use of AUA as a restart codon. Molecular and General Genetics, 221, 267-272) describes a situation, in which translation of the E. coli lacZ gene is dependent on translation of an upstream B. licheniformis penP gene fragment, the genes being fused so that the stop codon (TAG) of the penP gene fragment overlaps a putative start codon (ATA) for the lacZ gene. Use of this particular codon for initiation was confirmed by N-terminal amino acid sequencing of the produced lacZ gene product.
Kojima et al. (Kojima, K. K. et al. (2005) Eukaryotic Translational Coupling in UAAUG Stop-Start Codons for the Bicistronic RNA Translation of the Non-Long Terminal Repeat Retrotransposon SART1, Molecular and Cellular Biology, 25(17): 7675-7686) discloses translational coupling in a eukaryotic system.
It has been our experience with synthetic gene design that different genes encoding the same protein product, and designed according to the same (or at least very similar) codon optimization rules, have resulted in genes giving significantly different protein yields when inserted into the desired expression organism, yield differences for which we currently have no explanation.
There is therefore a strong need for methods for rapid and efficient identification of those synthetic genes that give high protein yields.