More than 50% of organic carbon on earth is found in the cell walls of plants. Plant cell walls consist mainly of the compounds: cellulose, hemicellulose, and lignin. Collectively these compounds are called “lignocellulose,” and they represent a potential source of sugars and other organic molecules for fermentation to ethanol or to other high-value products.
The conversion of lignocellulose (or lignocellulosic biomass) to ethanol has become a key feature of emerging energy policies due to the environmentally favorable and sustainable nature of cellulosic ethanol. There are several technologies being developed for cellulose conversion. Of interest here is a method by which lignocellulosic biomass is subjected to a pretreatment that increases its susceptibility to hydrolytic enzymes, followed by enzymatic hydrolysis of the pretreated lignocellulose to sugars and the fermentation of those sugars to ethanol or other high-value organic molecules (e.g. butanol). Common pretreatment methods include dilute acid steam explosion (U.S. Pat. No. 4,461,648), ammonia freeze explosion (AFEX; Holtzapple et al., 1991), and organosolv extraction (U.S. Pat. No. 4,409,032). Hydrolysis and fermentation systems may be either separate (sequential hydrolysis and fermentation; SHF) or coincident (simultaneous saccharification and fermentation; SSF). In all instances, the hemicellulose and cellulose are broken down to sugars that may be fermented, while the lignin becomes separated and may be used either as a solid fuel or as a source for other organic molecules.
The enzymatic hydrolysis of the pretreated lignocellulose is carried out by cellulase enzymes. The term cellulase (or cellulase enzymes) broadly refers to a class of glycoside hydrolase enzymes (or glycosidases) that catalyze the hydrolysis of the beta-1,4-glucosidic bonds joining individual glucose units in the cellulose polymer. The catalytic mechanism involves the synergistic actions of endoglucanases (Enzyme Commission number E.C. 3.2.1.4), cellobiohydrolases (E.C. 3.2.1.91) and beta-glucosidase (E.C. 3.2.1.21). Endoglucanases hydrolyze accessible glucosidic bonds in the middle of the cellulose chain, while cellobiohydrolases release cellobiose from these chain ends processively. Beta-glucosidases hydrolyze cellobiose to glucose and, in so doing, minimize product inhibition of the cellobiohydrolases. Collectively, the enzymes operate as a system that can hydrolyze a cellulose substrate.
Cellulase enzymes, as well as other glycoside hydrolases or glycosidases that hydrolyze poly- or oligo-saccharides, typically have a similar modular structure, consisting of one or more catalytic domain(s) and one or more carbohydrate-binding modules (CBM) joined together by flexible linker peptide(s). Many hemicellulases, e.g., xylanases (E.C. 3.2.1.8), mannanases (E.C. 3.2.1.78) and arabinofuranosidases (E.C. 3.2.1.55), are known to have a similar modular structure of a catalytic domain joined to a CBM via a flexible linker. Hemicellulases are enzymes that catalyze hydrolysis of the glycosidic linkages in the xylan backbone polysaccharide of hemicellulose or glycosidic linkages between xylose units in the xylan backbone and other sugars attached to the backbone.
The catalytic domain is a distinct structural domain that catalyzes the hydrolysis of the glycosidic linkages in the substrate. Many glycoside hydrolase catalytic domains have been isolated and characterized. The catalytic domain is typically, though not necessarily, the larger of the two domains. Glycoside hydrolases sharing a common three-dimensional structure and catalytic mechanism, though not necessarily substrate specificity, have been grouped into Families (Davies and Henrissat, 1995). To date, there are over 150 Glycoside Hydrolase (GH) families. Cellulase enzymes are found in many GH families including, but not limited to, Family 5, 6, 7, 8, 9, 12, 44, 45, 48, 51, 61 and 74; xylanase enzymes are found in Family 5, 8, 10, 11 and 43; mannanase enzyme are found in Family 5, 26 and 113; arabinofuranosidase enzymes are found in Family 3, 43, 51, 54 and 62; and beta-glucosidase enzymes are found in Family 1 and 3.
Linker peptides are extended yet flexible structures that maintain the spatial orientation of the catalytic domain relative to the CBM (Shen et al., 1991; Receveur et al., 2002; Boisset et al., 1995). Naturally-occurring linker peptides in cellulase and hemicellulase enzymes, whether from bacterial or fungal sources, vary from 6-60 amino acids in length. These peptides are similar in their chemical properties and amino acid composition, if not their specific sequences, with the amino acids serine, threonine, and proline accounting for more than 50% of the amino acids in the linker peptide (reviewed in Gilkes et al. (1991). Linkers also contain several charged residues of a common type, either all negative (such as Glu or Asp) or all positive (such as Lys, Arg or His). The serine and threonine residues may be modified with O-linked glycans, which, in fungi, are predominantly mannose (Fagerstam et al., 1984). Results from small-angle x-ray or dynamic light scattering suggest that glycosylation of the linker peptide favours a more extended conformation, altering the relative positioning of the catalytic domain and CBM.
The carbohydrate binding module (CBM) is typically, though not always, smaller than the catalytic domain. The role of the CBM is to bring the enzyme into close and prolonged contact with the carbohydrate substrate and to increase the rate of substrate degradation. CBMs are found in a variety of enzymes involved in the degradation of carbohydrate substrates, including cellulases, hemicellulases, glucanases, amylases, glucoamylases, chitinases and the like. Thus, CBMs can recognize and bind to crystalline cellulose, non-crystalline cellulose, chitin, beta-1,3 glucans, mixed beta-1,3-1,4 glucans, xylan, mannan, galactan, and starch.
As is the case for catalytic domains, CBMs assume a variety of structures that govern their substrate binding affinities and can therefore also be classified into Families based on their structural and functional relationships. To date there are 59 known CBM Families (see URL cazy.org/fam/acc_CDM.html). Much research has been conducted over the past two decades to elucidate the function and structure of CBMs (as reviewed by Boraston et al., 2004; Hashimoto 2006 and Shoseyov et al., 2006).
The present application relates to Family 1 CBMs. These CBMs are found almost exclusively in fungal enzymes, including cellulase and hemicellulase enzymes produced by Trichoderma ssp., Aspergillus ssp., Hypocrea ssp., Humicola ssp., Neurospora ssp., Orpinomyces ssp., Gibberella ssp., Emericella ssp., Chaetomium ssp., Chrysosporium ssp., Fusarium ssp., Penicillium ssp., Magnaporthe ssp., Phanerochaete ssp., Trametes ssp., Lentinula edodes, Gleophyllum trabeiu, Ophiostoma piliferum, Corpinus cinereus, Geomyces pannorum, Cryptococcus laurentii, Aureobasidium pullulans, Amorphotheca resinae, Leucosporidium scotti, Cunninghamella elegans, Thermomyces lanuginosa, Sporotrichum thermophile, and Myceliophthora thermophilum. 
Family 1 CBMs were initially identified as cellulose binding domains (or CBDs) of fungal cellulases. Family 1 CBMs comprise approximately 40 amino acids and may be found at either the N- or C-terminus of the enzyme. Family 1 CBMs assume a small, wedge-shaped beta-sandwich structure with a flat binding surface containing three aromatic amino acids (usually tryptophan) spaced about 10 angstroms apart (Kraulis et al., 1989; Mattinen et al., 1997). These aromatic residues facilitate binding to the surfaces of crystalline substrates such as cellulose and chitin via van der Waal's contacts with the substrate surface (Mattinen et al., 1997; Reinikainen et al., 1992, Tormo et al., 1996).
The enzymatic hydrolysis of pretreated lignocellulosic feedstocks is an inefficient step in the production of cellulosic ethanol and its cost constitutes one of the major barriers to commercial viability. Improving the enzymatic activity of cellulases or increasing cellulase production efficiency has been widely regarded as an opportunity for significant cost savings.
The negative effects of lignin on cellulase enzyme systems are well documented. Removal of lignin from hardwood (aspen) was shown to increase sugar yield by enzymatic hydrolysis (Kong et al., 1992). Similarly, removal of lignin from softwood (Douglas fir) was shown to improve enzymatic hydrolysis of the cellulose, an effect attributed to improved accessibility of the enzymes to the cellulose (Mooney et al., 1998). Other groups have demonstrated that cellulases purified from Trichoderma reesei bind to isolated lignin (Chernoglazov et al., 1988) and have speculated on the role of the different binding domains in the enzyme-lignin interaction (Palonen et al., 2004). Binding to lignin and inactivation of Trichoderma reesei cellulases has been observed when lignin is added back to a pure cellulose system (Escoffier et al., 1991). Another study showed that lignin did not have any significant effect on cellulases (Meunier-Goddik and Penner, 1999). While other reports suggest that some hemicellulases may be resistant to, and even activated by, lignin and lignin breakdown products (Kaya et al., 2000). Nonetheless, it is generally recognized that lignin is a serious limitation to enzymatic hydrolysis of cellulose.
Cellulases purified from Trichoderma reesei have been shown to bind to isolated lignin (Chernoglazov et al., 1988). Further work has shown that all three domains, catalytic core, linker and CBM, will bind to lignin (Palonen et al., 2004). For example, Cel7B from Humicola sp., which exists naturally as just a catalytic domain without a CBM, is bound extensively by lignin (Berlin et al., 2005). Similarly Trichoderma Cel5A core, devoid of a CBM, does not bind enzymic lignin and binds alkali extracted lignin to a lesser extent than does the full-length protein (Palonen et al., 2004). CBMs are reportedly involved in lignin binding. For example, removal of the CBM from Trichoderma Cel7A essentially eliminates binding to alkali extracted lignin and to residual lignin prepared by enzyme hydrolysis (Palonen et al., 2004).
The absence of lignin resistant cellulases represents a large hurdle in the commercialization of cellulose conversion to soluble sugars including glucose for the production of ethanol and other products. The development of lignin resistant enzymes must preserve their cellulolytic activity. A variety of methods have been suggested to reduce the negative impact of lignin on the cellulase system. Non-specific binding proteins (e.g. bovine serum albumin; BSA) have been shown to block interactions between cellulases and lignin surfaces (Yang and Wyman, 2006; U.S. Publication No. 2004/0185542A1, U.S. Publication No. 2006/0088922A1; WO05024037 A2, A3; WO09429474 A1). Other chemical blocking agents and surfactants have been shown to have a similar effect (Tu et al., 2007; U.S. Pat. No. 7,354,743).
Modified glycosidase enzymes and methods for modification have been extensively described. In most instances, mutations are specifically directed to the catalytic domain of the enzyme. For example, variants of Trichoderma reesei Cel7A and Cel6A catalytic domains to improve thermostability have been reported (U.S. Pat. No. 7,375,197; WO 2005/028636; U. S. Publication No. 2007/0173431; Publication No. 2008/167214; WO 2006/074005; Publication No. 2006/0205042; U.S. Pat. No. 7,348,168; WO 2008/025164). In particular, substitution of the amino acid at the equivalent of position 413 in T. reesei Cel6A with a proline in Family 6 cellulases, e.g., a S407P mutation in the Phanerochaete chrysosporium Cel6A, confers increased thermostability (WO 2008/025164). Mutations at the equivalent of positions 103, 136, 186, 365 and 410 within the catalytic domain of T. reesei Cel6A and other Family 6 cellulases have been shown to lead to reduce inhibition by glucose (U.S. Publication No. 2009/0186381A1). Variants with resistance to proteases and to surfactants for detergent formulations have been created for textile applications (WO 99/01544; WO 94/07998; and U.S. Pat. No. 6,114,296).
Recently, modified cellulases exhibiting reduced interactions with, or reduced inactivation by, lignin have been reported. For example, WO2010/012102 reports that mutations at the equivalent of positions 129, 322, 363, 365 and 410 within the catalytic domain of T. reesei Cel6A (TrCel6A) and other Family 6 cellulases results in increased hydrolytic activity in the presence of lignin. Similarly, WO2009/149202 discloses cellulase variants with mutations that remove positive charges or introduce negative charges at the equivalents of positions 63, 77, 129, 147, 153, 161, 194, 197, 203, 237, 247, 254, 281, 285, 289, 294, 327, 339, 344, 356, 378 and 382 in the linker peptide and catalytic domain of Cel6A from Hypocrea jecorina. Such cellulase variants show reduced affinity to lignin, ethanol or heat treatment.
Only in a few instances has the linker peptide been identified as playing a critical role or as a target for modification. The linker peptide of the Humicola Family 45 endoglucanase was modified to reduce proteolysis (WO 94/07998; U.S. Pat. No. 6,114,296) and the linker peptide of the Trichoderma Cel7A was modified to promote thermostability (U.S. Pat. No. 7,375,197). U.S. Publication No. 2010/0221778A1 reports that mutations that reduce the isoelectric point and/or increase the Ser/Thr ratio of the linker peptide can also lead to increased hydrolytic activity in the presence of lignin.
There are relatively few reports of modifying CBMs. In one instance, Linder et al. (1995) showed that mutations of the tyrosine residues on the binding face of the Family 1 CBM from T. reesei Cel7A significant reduce its binding to cellulose while mutations at other highly conserved, but non-aromatic, amino acids on the binding surface result in less of a reduction of cellulose binding. In another instance, it was reported that substitution of the tyrosine residue at the “tip” of the wedge-shape structure, equivalent to Tyr33 in the TrCel6A-CBM to a histidine resulted in pH-dependent binding to cellulose (Linder et al., 1999). However, while it has been observed that Family 1 CBMs interact with lignin, there are no reports on the development of modified Family 1 CBMs with reduced binding to lignin.