Recombinant protein expression has become a major tool to analyze intracellular processes. The expression of foreign genes in transformed organisms is now an indispensable method for purification of the proteins for subsequent uses, such as protein characterization, protein identification, protein function and structure study, etc. Proteins are also needed to be expressed at large scale to be used as enzymes, as nutritional proteins and as biopharmaceuticals (drugs). Escherichia coli (E. coli) is one of the most widely used protein expression host system because it allows rapid expression and subsequent large-scale, cost-effective manufacturing of the recombinant proteins. While most prokaryotic genes are readily expressed in a prokaryotic expression system, such as E. coli, many eukaryotic genes cannot be expressed efficiently in a prokaryotic system. The completion of the human genome sequencing project has led to a rapid increase in genetic information, with tens of thousands of new proteins waiting to be expressed and explored. Efficiently expressing these proteins in a recombinant system, such as an E. coli cell, for further study and use has become a pressing issue.
Many sequence factors, such as codon usage, mRNA secondary structures, cis-regulatory sequences, GC content and other similar variables affect protein expression (Villalobos et al, 2006, “Gene Designer: a synthetic biology tool for constructing artificial DNA segments,” BMC Bioinformatics 7, 285). Methods have been developed to optimize one or more sequence elements to improve protein expression. For example, it has been demonstrated that codon optimization can increase protein expression level (Pikaart et al., 1996, Expression and codon usage optimization of the erythroid-specific transcription factor cGaTA-1 in baculoviral and bacterial systems, Protein Expression and Purification, vol. 8, pp. 469-475; and Hale et al., 1998, Codon optimization of the gene encoding a domain from human type 1 neurofibromin protein results in a threefold improvement in expression level in Escherichia coli, Protein Expression and Purification, vol. 12, pp. 185-188). However, the prior art methods are generally limited to the optimization of a particular sequence factor, e.g., codon usage, that improves recombinant expression of a particular protein in a specific host cell. There remains a need of a general method for sequence optimization that takes into account of multiple or all sequence factors and is applicable for improved expression of any protein in any host cell.
Particle Swarm Optimization (PSO) is a population based stochastic optimization technique modeled on swarm intelligence that finds a solution to an optimization problem in a search space or model and predicts social behavior in the presence of objectives. It was first developed by Dr. Eberhart and Dr. Kennedy in 1995, inspired by social behavior of bird flocking or fish schooling (Proceedings of the IEEE International Conference on Neural Networks, 1942-948). In PSO, the potential solutions, called particles, fly through a multidimensional problem space by following the current optimum particles. Each particle keeps track of its coordinates (position and velocity) in the problem space which are associated with the best solution (fitness) it has achieved so far, the local best. Each particle also tracks the “best” value obtained so far by any particle in the neighbors of the particle, the neighboring best. When a particle takes all the population as its topological neighbors, the best value is a global best, which is known to all and immediately updated when a new best position is found by any particle in the problem space.
The particle swarm optimization concept consists of, at each time step, changing the velocity of each particle toward its local best and neighboring best locations. The change in velocity is weighted by a random term, with separate random numbers being generated for change in velocity toward its local best and neighboring best locations.
It is demonstrated that PSO gets better results in a faster, cheaper way compared with other methods. In addition, there are few parameters to adjust in PSO algorithm. PSO can be used across a wide range of applications, as well as for specific applications focused on a specific requirement. In the past several years, PSO has been successfully applied in several research and application areas. For example, PSO has been successfully applied in research and application areas such as bellow optimum design (Ying et al, 2007, Application of particle swarm optimization algorithm in bellow optimum design, Journal of Communication and Computer, 32, 50-56). It has also been used for optimization of codon usage (Cai et al, 2008, Optimizing the codon usage of synthetic gene with QPSO algorithm, Journal of Theoretical Biology, 254, 123-127).
Despite the exhaustive effort of protein expression researchers and ever-increasing knowledge of protein expression, significant obstacles remain when one attempts to express a foreign or synthetic gene in a protein expression system such as E. coli. There is a need of a faster and simpler systematic sequence optimization method that coordinates various sequence factors, resulting in improved protein expression in a recombinant system. Such a method is described here.