Lignocellulosic feedstocks are a promising alternative to corn starch for the production of fuel ethanol. These raw materials are widely available, inexpensive and several studies have concluded that cellulosic ethanol generates close to zero greenhouse gas emissions.
However, these feedstocks are not easily broken down into their composite sugar molecules. Recalcitrance of lignocellulose can be partially overcome by physical and/or chemical pretreatment. An example of a chemical pretreatment is steam explosion in the presence of dilute sulfuric acid (U.S. Pat. No. 4,461,648). This process removes most of the hemicellulose, but there is little conversion of the cellulose to glucose. The pretreated material may then be hydrolyzed by cellulase enzymes.
The term cellulase broadly refers to enzymes that catalyze the hydrolysis of the beta-1,4-glucosidic bonds joining individual glucose units in the cellulose polymer. Cellulases belong to the larger group of glycosyl hydrolases (GHs) which are organized in families and clans based on structural homology (Davies and Henrissat, 1995, Structure 15:853; Carbohydrate Active Enzymes database, Cantarel et al., 2009, Nucleic Acids Res., 37:D233). An updated database of members of the over 100 families of GH enzymes may be found at URL: www.cazy.org/Glycoside-Hydrolases/html. GHs include enzymes that catalyze the hydrolysis of other oligo- and poly-saccharides (e.g. glucanases, xylanases, mannosidases, galactosidases, etc.).
The conversion of cellulose to glucose involves the synergistic actions of endoglucanases (E.C. 3.2.1.4), cellobiohydrolases (E.C. 3.2.1.91) and beta-glucosidases (E.C. 3.2.1.21) (Henrissat et al, 1994; Knowles et al., 1987; Lynd et al., 2002; Teeri, 1997; Wood and Garcia-Campayo, 1990; Zhang and Lynd, 2004). Endoglucanases hydrolyze accessible glycosidic bonds in the middle of the cellulose chain, while cellobiohydrolases processively release cellobiose from these chain ends. Beta-glucosidases hydrolyze cellobiose to glucose thus minimizing product inhibition of the cellobiohydrolases and endoglucanases.
Although cellulases drive hydrolysis of cellulose to glucose, additional enzymes have been discovered that enhance the efficiency of a cellulase system. These enzymes may include hemicellulases, which break down xylan and other hemicellulosic material in biomass (Maheshwari et al., 2000, Microbiol Mol Biol Rev. 64:461); swollenins and expansins, which rearrange the structure of cellulose (Saloheimo et al, 2002, Eur. J. Biochem. 269:4202; Sampedro and Cosgrove, 2005, Genome Biol. 6:242); and partially or uncharacterized activities such as the GH Family 61 enzymes (Harris et al., 2010, Biochemistry 49:3305) and the cellulose-induced proteins (CIPs—Foreman et al., 2003, J. Biol. Chem. 278:31988). High efficiency cellulase systems for the conversion of lignocellulosic substrates will incorporate any or all of these enzymes depending on the composition of the biomass and the process conditions (Henrissat et al., 1985, Bio/technology 3:722; Baker et al., 1998, Appl. Biochem. Biotechnol. 70-72:395; Boisset et al., 2001, Biotechnol. Bioeng. 72:339; Berlin et al., 2007, Biotechnol. Bioeng. 97:287; Gusakov et al., 2007, Biotechnol. Bioeng. 97:1028; WO2008/025165; WO2009/026722; Meyer et al., 2009, J. Cereal Sci. 50:337).
Cellulases—as well as other GH enzymes—share common gross structures and mechanisms of catalysis (Teeri et al., 1992, Biotechnology 21:417). All GH enzymes have a catalytic domain (CD) and the particular structure of this domain determines its GH Family designation, of which there are over 100. Two general catalytic mechanisms have been identified for GHs and all enzymes from a given family will have a common mechanism (McCarter and Withers, 1994, Curr. Opin. Struct. Biol. 4:885; Zechel and Withers, 2000, Acc. Chem. Res. 33:11). Retaining enzymes, which retain the anomeric configuration of the reducing end, hydrolyze by means of a double displacement reaction wherein the reducing side of the target linkage is first displaced and covalently attached to an acidic residue in the active site, followed by a second displacement, usually by water (though possibly by other hydroxyl-containing compounds including sugars) to complete the displacement (White and Rose, 1997, Curr. Op. Struct. Biol. 7:645). Inverting enzymes, which invert the configuration of the anomeric carbon, have an activated water to which the reducing end of the target linkage is directly displaced.
Cellulases, as well as many hemicellulases and enzymes accessory to cellulose hydrolysis, often have a carbohydrate binding module (CBM) also referred to as a cellulose binding domain (CBD) in the case of cellulases. One function of the CBM is to facilitate contact of the CD with the substrate. Some research suggests that certain CBMs may also disrupt cellulose structure and thus facilitate catalytic activity by the CD (Din et al., 1994, Proc. Natl. Acad. Sci. U.S.A. 91:11383; Teeri et al., 1992, Biotechnology 21:417). There are three general types of CBM represented in over 40 different families. Of these, Type I CBMs bind to the surface of crystalline cellulase; Family 1 CBMs belong to Type I and are the only type found in glycosyl hydrolases from filamentous fungi. Structurally, a CBM/CBD may be nearly contiguous with the CD, but usually these domains are connected by an unstructured, often glycosylated, linker peptide typically of 10 to 50 residues in length that may be involved in cellulose interactions (Srisodsuk et al., 1993, J. Biol. Chem. 268:20756).
The cellobiohydrolases are the primary drivers of cellulose hydrolysis. These enzymes bind to the free ends of cellulose chains and catalyze, often processively, the cleavage of cellobiose units from the chain ends. Thus, cellobiohydrolases catalyze the majority of reactions that release soluble oligosaccharides from the solid cellulose substrate and make them available for further hydrolysis to glucose. There are two major classes of CBH (Barr et al., 1996, Biochemistry 35:586). Cellobiohydrolase 2 (CBH2 or CBHII) enzymes are inverting enzymes that hydrolyze from the non-reducing end of the cellulose chain; most CBH2 enzymes are found in GH Family 6. Cellobiohydrolase 1 (CBH1 or CBHI) enzymes are retaining enzymes that hydrolyze from the reducing end of the cellulose chain; most CBH1 enzymes are found in GH Family 7. CBH-like enzyme activities exist in other GH Families (e.g. Family 9 and 48). The activities of, and synergy between, CBH1 and CBH2 accounts for a majority of total cellulase activity from most fungal systems.
Glycoside hydrolase enzymes comprising Family 7 CDs are found in eukaryotes, primarily fungi, and comprise both exo- and endo-glucanases. GH Family 7 enzymes are mostly cellulases, although chitosanase and xylanase activities have also been reported. The family is distinguished by a beta-jelly roll core structure, with much of the protein in random coil held together by disulfide bonds. Exoglucanases of this family, in contrast to endoglucanases, have peptide loops that cover the active site cleft, turning it into a closed tunnel that channels a cellulose chain past the active site residues and enables high processivity (Kleywegt et al., 1997, J Mol Biol. 272:383). The active site tunnel or cleft consists of a series of monomer (glucose in the case of cellulose) binding sites that associate with the polysaccharide chain by means of ring stacking interactions between the sugar residues and aromatic side chains in the cleft. In the case of cellulases, the bound cellulose is thus correctly aligned with two acidic residues at the active site which catalyze the double displacement hydrolysis. The monomer binding sites are numbered from the point of catalysis, with positive numbers proceeding in the direction of the reducing end of the cellulose chain, and negative numbers in the direction of the non-reducing end. Therefore, the catalytic site lies between +1 and −1 sites, and for the Cel7 cellobiohydrolases, the product binding site would be the +1 and +2 sites which would be occupied by cellobiose after the initial displacement (Divne et al., 1998, J. Mol. Biol. 275:309).
Cellulases and other GHs are produced by a wide range of organisms for a variety of natural purposes. Cellulases are predominantly enzymes secreted by micro-organisms—bacteria and fungi—to obtain nutrients from the environment. Bacteria often secret enzymes that are linked together in an extended, non-covalently associated structure called the cellulosome (Fontes and Gilbert, 2010, Annu Rev Biochem. 79:655). Fungi typically express individual enzymes, though some fungi, such as those living in the rumen of ruminant animals (e.g. sheep and cattle), can also form extended cellulosome-like structures (Ljungdahl, 2008, Ann. N.Y. Acad. Sci. 1125:308). The precise enzymes and enzyme ratios that an organism expresses will be determined by the substrates upon which they have evolved and the current substrate upon which they are growing. Expression of most cellulases and hemicellulases is induced by small molecules related to the target substrate (Schmoll and Kubicek, 2003, Acta Microbiol Immunol Hung. 50:125; Mach and Zeilinger, 2003, Appl Microbiol Biotechnol. 60:515).
The mesophilic fungus Trichoderma reesei (the anamorph of Hypocrea jecorina) and the thermophilic fungus Myceliophthora thermophila (the anamorph of Thielavia heterothallica) are major sources for industrially useful glycoside hydrolases. Both secrete large amounts of protein comprised mostly of hydrolytic enzymes, and for this reason are useful production hosts for industrial enzymes.
T. reesei secretes two GH Family 7 enzymes, CBH1 (Cel7A) and EG1 (Cel7B), of which the Cel7A is the major secreted protein product. The three-dimensional structures of the catalytic domains of both Cel7A and Cel7B have been solved (Divne, et al., 1998, J. Mol. Biol. 275: 309-325; Kleywegt et al., 1997, J. Biol. Chem. 272: 383-397) as have structures for several other Family 7 enzymes. M. thermophila secretes at least three GH Family 7 enzymes. Other industrially relevant cellulases come from fungi including, but are not limited to, species of Aspergillus, Chaetomium, Chrysosporium, Coprinus, Corynascus, Fomitopsis, Fusarium, Humicola, Magnaporthe, Melanocarpus, Myceliophthora, Neurospora, Phanerochaete, Podospora, Rhizomucor, Sporotrichum, Talaromyces, Thermoascus, Thermomyces and Thielavia. 
GHs, particularly cellulases and hemicellulases, have many useful applications in industry. Cellulases are used in the textile industry for biopolishing, denim abrasion, and detergent applications (e.g. Anish et al., 2007, Biotechnol Bioeng. 96:48; Montazer et al., 2010, Appl Biochem Biotechnol. 160:2114; Shimonaka et al., 2006, Biosci Biotechnol Biochem. 70:1013). Glucanases and xylanases are used in the brewing and baking industries to reduce viscosity and improve product texture (e.g. Bai et al., 2010, Appl Microbiol Biotechnol. 87:251). Hemicellulases, particularly xylanases, are used in the pulp and paper industry to improve bleachability, improve process efficiency and modify paper quality and attributes (e.g. Suurnäkki et al., 1997, Adv Biochem Eng Biotechnol. 57:261). Finally, cellulases are being used to hydrolyze cellulose to sugars for fermentation to value added products, particularly biofuels and fuel grade ethanol (Dashtban et al., 2009, Int J Biol Sci. 5:578). Because GH Family 7 cellobiohydrolases are recognized as primary drivers of cellulose hydrolysis in cellulase enzyme systems, intense efforts have been made to improve these enzymes using the methods of modern molecular biology.
Targets for enzyme improvement depend upon the process conditions and the end goal of the enzyme application. Common improvement targets are thermostability and thermophilicity to enable enzymes to work at high process temperatures. Higher process temperatures are favored to increase reaction rates and decrease the likelihood of microbial contamination. Another common target is pH optimum and range, which may need to be aligned between enzyme and process. Reducing enzyme inhibition and inactivation by process-specific factors, including product inhibition, may be important for certain process configurations. Finally, increasing the specific activity of an enzyme under process conditions is always desirable. Targets for enzyme modification are far ranging and highly specific to the process and end goal. One general target may be broadening, narrowing, or changing the substrate specificity. Another general target might be limiting the stereochemistry of a reaction.
Many approaches have been developed to improve and/or modify the attributes of an enzyme. These run a gamut from rational design to directed evolution. For rational design, the structure/function relationship of the protein is carefully considered and conscious design changes are made based on an understanding of protein biochemistry (e.g. Wohlfahrt et al., 2003, Biochemistry 42:10095). For directed evolution, a library of enzyme variants comprising random changes throughout the amino acid sequence is made and the library is screened by means of an assay to identify improved/altered variants (Arnold and Moore, 1997, Adv Biochem Eng Biotechnol. 58:1; Kim et al., 2000, Appl Environ Microbiol. 66:788). A great many hybrid approaches also exist, sometimes referred to as “semi-rational” design. For example, it has been known for a long time that the consensus sequence of a protein family is often more stable than individual members (Lehmann et al., 2000, Biochim Biophys Acta. 1543:408; Lehmann and Wyss, 2001, Curr Opin Biotechnol. 12:371). Therefore, one approach to generating more stable enzymes is to mutate non-consensus residues to the consensus sequence. Another example is the SCHEMA approach involves the random swapping of structurally defined domains from several members of a common protein family and then screening for improved/altered variants by means of an assay (Silberg et al., 2004, Methods Enzymol. 388:35; Heinzelman et al., 2009, Proc Natl Acad Sci USA. 106:5610). A third example is the ProSar algorithm which uses information from initial random screens to design secondary and tertiary recombinants for screening (Fox et al., 2003, Protein Eng. 16:589).
GH Family 7 enzymes have been an area of intense investigation and development for commercial applications. For example, the Trichoderma Cel7A has been mutated by rational design to alter the pH optimum and thermostability of the enzyme (Becker et al., 2001, Biochem J. 356:19; Boer and Koivula, 2003, Eur J Biochem. 270:841). A Cel7A consensus sequence has been constructed and expressed, and shown to be more thermostable than the Trichoderma Cel7A enzyme (U.S. Publication No. 2005/0054039). The SCHEMA approach has been applied to create hybrid Cel7A enzymes of increased thermostability (Heinzelman et al., 2010, Protein Eng Des Sel. 23:871). Random mutagenesis has also been applied to identify improved Trichoderma Cel7A variants. Both rational design and directed evolution have been applied to improve the thermostability of Cel7B from Melanocarpus (Voutilainen et al., 2007, Enz Microb Technol. 41:234; Voutilainen et al., 2009, Appl Microbiol Biotechnol. 83:261). A similar rational design approach was applied to stabilize Cel7A from Talaromyces, and serendipitously improved the specific activity in one instance (Voutilainen et al., 2010, Protein Eng Des Sel. 23:69). The CBM of the Trichoderma Cel7A has been engineered to make binding to cellulose pH-sensitive and thus reversible under process conditions, enabling the possibility of enzyme recycling (Reinikainen et al., 1992, Proteins 14:475; Reinikainen et al., 1995, Proteins 22:392; Linder et al., 1999, FEBS Lett. 447:13). The KM of the Humicola Cel7 endoglucanase was lowered by rational design mutations that create an additional sugar monomer binding site (Davies et al., 1997, J Biotechnol. 57:91.
Well-designed assays are key to the successful identification of improved enzymes, particularly in the case of stochastic methods (e.g. directed evolution). Although soluble chromogenic and fluorogenic substrates have been developed for detecting the activity of GH enzymes and may be used in some screening assays, the performance of a GH enzyme on these artificial substrates often does not correlate to activity on a native or technical substrate such as cellulose or xylan. Instances where improvement on one substrate did not correlate to improvement on another have been documented (Teeri et al., 1998, Biochem Soc Trans. 26:173; Voutilainen et al., 2010, Protein Eng Des Sel. 23:69; Kura{hacek over (s)}in and Väljamäe, 2011, J Biol Chem. 286:169). Therefore, when screening for improved enzymes, it is critical to use process relevant substrates and conditions.
Variants of T. reesei (H. jecorina) Cel7A comprising a number of amino acid substitutions are disclosed in U.S. Pat. No. 7,951,570, U.S. Publication No. 2011/0229956, and U.S. Publication No. 2009/0075336. Variants of M. thermophile CBH1a comprising a number of amino acid substitutions are disclosed in U.S. Publication No. 2012/0003703. However, these variants were isolated by screening for improved thermostability or thermophilicity using soluble substrates rather than a cellulosic substrate. In some cases, the thermostable variants were subsequently characterized for their cellulose-hydrolyzing activity.
Here we present variants of T. reesei Cel7A isolated by screening for improved activity using process-relevant substrates under process-relevant conditions. Specific mutations conferring improvement were mapped to Cel7A cellobiohydrolases from other organisms to demonstrate that the improvements can be generalized to GH Family 7 enzymes.