Field of the Invention
The present invention relates to genome-scale metabolic models for microbial strains. More particularly, the present invention relates to improvements in genome-scale metabolic models that identify and optimize metabolic flux states that minimize the cost of enzyme production while maximizing a desired cellular phenotype such as cellular growth.
Description of Related Art
Petroleum and natural gas are used as the primary raw materials for the manufacturing of most industrial chemicals and polymers. Economic, environmental and geopolitical concerns are driving research efforts to replace fossil fuel-based chemical manufacturing with renewable, bio-based processes that are cheaper, greener and able to be carried out entirely domestically. The key components of these processes will be microorganisms that have been engineered to efficiently carry out a desired metabolism, converting inexpensive carbon substrates (e.g., glucose, CO2, lignocellulosic biomass) to valuable molecular products. In 2003, the successful engineering of Escherichia coli for the production of the monomer 1,3-propanediol by Genencor and DuPont marked an important milestone for metabolic engineering. According to DuPont, biologically produced 1,3-propanediol contributes to about 37% of the mass of Dupont's SORONA polymer fiber and is likely to become the first billion-dollar, non-pharmaceutical industrial biotechnology product. While the 1,3-propanediol bioprocess is a commercialized success, the development of the engineered strain for the process was the result of large investments in time and resources. These investments underscore a need for more efficient bioengineering-based chemical manufacturing processes before these processes can begin to ease the demand for fossil fuels.
Effective design is the hallmark of a mature engineering discipline and necessary for efficient product development. Recent advancements of technologies related to genome-scale characterization (systems biology), construction (synthetic biology) and modeling (computational biology) of biological systems provide the foundation for systems metabolic engineering applications. Experimental methodologies are poised to generate engineered biological strains for a bio-based chemical industry. The main impediment to realizing a bio-based chemical economy is the absence of methods for rigorous biological design (especially that account for multiple scales of biological components). Due to the complexity of cellular networks, the design of (whole-cell) metabolism must be model-guided to be effective.
Current approaches have used genome-scale models (GSMs) of intracellular chemical reactions as a design tool. GSMs are reconstructed from genomic information and the literature and involve steps such as functional annotation of the genome, identification of the associated reactions and determination of their stoichiometry, assignment of localization, determination of the biomass composition, estimation of energy requirements, and definition of model constraints (see Baart G J et al., Genome-scale metabolic models: reconstruction and analysis, Methods Mol Biol. 2012, 799:107-26). FIG. 1 shows a timeline of major developments in GSM implementation. The first genome-scale metabolic model was built for Haemophilus influenza in 1999, establishing the initial approach to genome-scale metabolic modeling (see Edwards J S and Palsson B O, Systems properties of the Haemophilus influenzae Rd metabolic genotype, J Biol Chem. 1999, 274(25):17410-6 (“Edwards and Palsson, 1999”)) using the constraint-based modeling approach (see Price N D et al., Genome-scale microbial in silico models: the constraints-based approach. Trends Biotechnol. 2003, 21(4):162-9), illustrated in FIG. 2. Work to add transcriptional regulation to GSMs was published in 2001 for Escherichia coli (see Covert M W et al. Regulation of gene expression in flux balance models of metabolism. J Theor Biol. 2001;213(1):73-88). In 2003 proton balancing of all biochemical reactions was implemented for E. coli (see Reed J L et al., An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR), Genome Biol. 2003, 4(9):R54).
Other additions to the genome-scale modeling approach included the formulation of signal transduction pathways (see Papin J A, Palsson B O. Topological analysis of mass-balanced signaling networks: a framework to obtain network properties including crosstalk, J Theor Biol. 2004, 227(2):283-97) and the addition of specification of thermodynamic constraints (see Jol S J, et al., Thermodynamic calculations for biochemical transport and reaction processes in metabolic networks. Biophys J., 2010, 99(10):3139-44). A semi-automated approach was developed for initial construction of the stoichiometric matrix for new organisms (see Henry C S, DeJongh M, Best A A, Frybarger P M, Linsay B, Stevens RL. High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat Biotechnol., 2010, 28(9):977-82).
After construction of a model, a variety of techniques can be used to analyze a GSM (see Edwards and Palsson, 1999; Schilling C H et al., Towards metabolic phenomics: Analysis of genomic data using flux balances, Biotechnol Progr. 1999; 15(3):288-95; Varma A and Palsson B O, Metabolic Flux Balancing: Basic concepts, Scientific and Practical Use, Bio/Technology. 1994; 12:994-8). Flux-based analysis (FBA) is a common approach for studying GSMs that operates by calculating the flow of metabolites through the metabolic network, thereby enabling the prediction of parameters such as the growth rate of an organism or rate of production of a commercially-significant metabolite (see Orth et al., What is flux balance analysis, Nat. Biotech. 2010 28(3):245-248). Although GSMs have been used in conjunction with flux balance analysis (FBA) to successfully predict such phenotypes to an extent, there are two underlying issues with this approach relevant to metabolic engineering. First, GSMs are underdetermined systems and any FBA solution (a predicted metabolic flux state) is actually one of hundreds of thousands of solutions (alternative flux states) that exhibit the same cellular phenotype. This flexibility makes it difficult to accurately predict actual in vivo metabolic states. Second, computational predictions using growth as a goal are fundamentally at odds with metabolic engineering goals where material and energetic resources are desired to go to a chemical product, not biomass.
The foundations of a genome-scale model (GSM) are the stoichiometric matrix that represents all of the biochemical capabilities and the gene-protein-reaction (GPR) relationships that connect genotype to biochemical phenotype. When using a GSM, simulations are typically run using linear programming algorithms to find solutions that maximize or minimize an objective. The most common objective used is a biomass objective that represents the cellular growth. Simulations that are run to maximize growth assume that cells utilize all of their resources to achieve fastest possible growth. The assumption that cells always seek to maximize growth is not always valid and a study has been conducted to consider additional possible cellular objectives. It has been found that cells do not solely maximize growth. There is at least some component of cellular function that considers energetic costs in terms of ATP (See Schuetz R, et al., Multidimensional optimality of microbial metabolism, Science, 336(6081):601-4). While this computational result may be biologically intuitive, these results point to some limitations in the way in which current GSMs are formulated and employed. These results indicate a shortcoming in the formulation of current GSMs since cellular energetics are only accounted for in a single maintenance pool of ATP that is typically included in the definition of a biomass objective.
Previous efforts in providing methods for metabolic modeling include those described in U.S. Patent Application Publication Nos. 2013/0095566 and 2007/0038419, which references are hereby incorporated by reference herein in their entireties. However, there is currently no mechanism or accounting for the cost of producing individual proteins in a GSM simulation. In the current modeling paradigm, any gene that is identified in the genome is available as a protein in any amount with no consideration given to the cellular cost to produce the protein necessary to carry out a biochemical reaction. There are no preferences or considerations given to the size of a protein or the number of reactions in a pathway. Thus, there is a need for more accurate GSM methods and tools for use in bioengineering applications that account for metabolic expenditures related to protein production.