Expressing high levels of stable and functional proteins remains the bottleneck of many scientific and biotechnological endeavors including the determination of protein structures and producing proteins for therapeutic purposes (Roodveldt et al., Current Opinion in Structural Biology, 15:50 (2005)). Large amounts of protein, usually between 5 and 50 mg, are required for every structural biology project (Edwards et al., Nat. Struct. Biol. 7:970 (2000)). Even greater amounts of protein and peptides are required for industrial, and pharmaceutical purposes. The inability to express proteins effectively in bacteria often leads to very high costs for pharmaceutical proteins and peptides. The cost of recombinant insulin in the US, for instance, is $3.3 billion dollars a year. granulocyte-colony stimulating factor (or G-CSF), which, under the trademark names of Neopogen and Neulasta, is used to treat cancer and AIDS patients. The wild type protein has a high tendency to aggregate, and needs to be stored under special conditions. The stability problems associated with the wild type G-CSF protein contributes to its phenomenal retail price of $500,000-$1,170,000/gm. Despite its high cost, G-CSF is effective, and $4.28 billion dollars of this drug were sold in 2007 (Amgen annual report 2007). The total global market for protein drugs was $47.4 billion in 2006.
Escherichia coli is the preferred host for recombinant protein expression for structural studies and pharmaceutical purposes because it is rather easy to genetically manipulate, it is relatively inexpensive to culture, labeling protocols for structural studies are established, and expression is fast (Studier et al., Methods Enzymol. 185 (1990), pp. 60-89; Braun and LaBaer, Trends Biotechnol. 21 (2003), pp. 383-388; Peti and Page, Protein Expression and purification Volume 51 pages 1-10. January, 2007). There are a wide variety of commercial products available for the E. coli expression system. However, there are disadvantages to using E. coli as an expression host. Many proteins fail to express in E. coli, or express, but do so as insoluble inclusion bodies. Data from one bacterial species indicates that at least 50% of non-membrane genes will require further optimization to obtain soluble or stable proteins for crystallization (Christendat et al., Nat Struct Biol 7 (2000), pp. 903-909).
During the last few years, both the pharmaceutical industry and the structural genomics community have made significant efforts to develop methods to overcome these problems. Conventional approaches to the production of soluble and active proteins in heterologous expression systems include low-temperature expression, promoters with different strengths, modified growth media and a variety of solubility-enhancing fusion tags (reviewed in Makrides, Microbiol Rev 60 (1996), pp. 512-538; Braun and LaBaer, Trends Biotechnol 21 (2003), pp. 383-388; Marsischky and LaBaer, Genome Res 14 (2004), pp. 2020-2028; Pearlberg and LaBaer, Curr Opin Chem Biol 8 (2004), pp. 98-102). A series of vectors and fusion partners that can be screened for high-level functional expression of a target protein have been developed (Berthold et al., Protein Sci 12 (2003), pp. 124-134). In addition, a few E. coli strains that facilitate the expression of membrane proteins (Miroux and Walker, J. Mol. Biol. 260 (1996), pp. 289-298), proteins with rare codons (Brinkmann et al., Gene 85 (1989), pp. 109-114), proteins with disulfide bonds (Prinz et al., J. Biol. Chem. 272 (1997), pp. 15661-15667), and proteins that are otherwise toxic to the cell have been developed. This variety of expression vectors and cell lines now significantly enhances the likelihood of designing an E. coli protein expression protocol suitable for the production of the substantial amounts of protein required for structural studies (Hunt, Protein Expr. Purif 40 (2005), pp. 1-22). However, there are still many challenges to getting a protein to express in a suitable format.
In the past several years, directed evolution has emerged as an alternative approach to rational design, enabling the improvement of structural and functional properties, such as stability and performance under different conditions (e.g., at extreme temperatures and pH, and in organic co-solvents), or changes in their reaction and substrate specificity (Tao and Cornish, Curr Opin Chem Biol 6 (2002), pp. 858-864). Rather than designing a limited number of site-directed mutants, directed evolution implements an iterative Darwinian optimization process, whereby the fittest variants are selected from an ensemble of random mutations. Improved variants are identified by screening or selection for the properties of interest and their encoding genes are then used as parent genes for the following round of evolution (Roodveldt et al., supra). Individually testing all the available variants in expression constructs and available bacterial strains often helps and robotics has assisted in this matter significantly.
Computational methods have also been used to predict protein stability. One method used to predict mutations with higher stability is the proprietary “protein design automation methodology that is the subject of U.S. Pat. No. 6,627,186.
However; a new approach is clearly needed.