The present invention, in some embodiments thereof, relates to computational chemistry and computational protein design and, more particularly, but not exclusively, to proteins designed for stability and a method of computationally designing and selecting an amino-acid sequence having desired properties.
Evolutionary processes have been shown to produce myriad of protein families, the members of which differ by more than 40% in terms of amino acid sequence identity, yet share common folds and sometimes similar functional activity. While fascinating in their simplicity and diversity, such evolutionary process are not regarded as efficient or optimal in terms of the number and type of mutations required to alter a protein sequence in order to alter its function. Yet, when attempted in the laboratory, human rationale and best computational and experimental tools and methodologies generally fail to improve upon the function of a protein even with a relatively small number of site-directed mutations, not to mention more than 10 mutations in a single sequence; such attempts rarely result in a protein that can be expressed or fold correctly.
Most proteins need to independently fold into their native conformation in order to perform their molecular function, and natural selection has acted to stabilize such proteins up to the necessary level required in their respective environments. However, in order to be useful under the stringencies of research, biotechnology, and pharmacology, proteins are required to be produced and function in non-natural conditions that include non-native and heterologous expression systems, elevated temperatures, non-physiological pH, and the presence of proteases, all of which can result in nullified production and activity or reduced protein half-lives.
While proteins hold great potential for extensive use in research, industry and pharmaceutics, their use is often hampered by instability, low denaturation temperature (Tm), low expression levels, low solubility, misfolding, aggregation, lipid encapsulation and short half-life. Computational and experimental techniques for protein stabilization have been in use for decades but predictability is low; typically they misclassify single-point deleterious mutations as stabilizing with a probability of about 20%. In addition, stabilizing mutation may still reduce or even abrogate function as stability and activity trade-off in some cases.
Due to the importance of protein stability, there has been a great number of research endeavors attempting to contribute in this field in the past decades. State of the art strategies involved sequence statistics-based strategies, such as back to consensus/ancestral and other computational algorithms [Steipe, B. et al., J Mol Biol., 1994, 15; 240(3):188-92; Lehmann M. et al., Biochim Biophys Acta, 2000, 29; 1543(2):408-415; Lehmann M. et al., Curr Opin Biotechnol, 2001, 12(4):371-5; Knappik, A. et al., J Mol Biol, 2000, 296(1):57-86; Binz, H. K. et al., J Mol Biol, 2003, 332(2):489-503; Sullivan, B. J. et al., J Mol Biol, 2011, 413(1):195-208; Sullivan, B. J. et al., J Mol Biol, 2012, 420(4-5):384-99; Iwabata, H. et al., FEMS Microbiol Lett, 2005, 243(2):393-8; and Watanabe, K. et al., J Mol Biol, 2006, 355(4):664-74]. However, no existing method has been able to predict large combinatorial mutants that do not contain deleterious mutations, which disrupt the protein structure rather than improve any one of its functions [Rees, D. O et al., Protein Sci, 2001, 10(6):1187-1194].
Computational algorithms typically use an energy function to predict the change in ΔΔG upon introducing mutation(s). Most currently available computational algorithms aim to predict only single point mutations, and provide a list of mutations that are not necessarily compatible with one another [Schymkowitz, J. et al., Nucleic Acids Res, 2005, 33:W382-8; Capriotti, E. et al., Nucleic Acids Res, 2005, 33:W306-10; Benedix, A. et al., Nat Methods, 2009, 6(1):3-4; and Pokala, N. et al., J Mol Biol, 2005, 347(1):203-27].
In general, presently known computational structure stabilization methodologies suffer from poor prediction accuracy of less than 60% [Potapov, V. et al., Protein Eng Des Sel, 2009, 22(9):553-60; and Kellogg, D. B. E. et al., Proteins, 2011, 79(3):830-8], requiring high-throughput experimental procedures to achieve significantly more stable protein variants. In addition, for large and highly challenging proteins these methods are ineffective.
RosettaVIP (void identification and packing) has been developed to improve the core packing of poorly packed proteins [Borgo, B. et al., Proc Natl Acad Sci USA, 2012, 109(5):1494-9]. The protocol recognizes voids within the protein core and then identifies small sets of mutations that reduce void volumes. This methodology successfully stabilized methionine aminopeptidase from E. coli. Another approach suggested a method for combinatorial design that is based on iterations between sequence redesign and backbone minimization, implemented in the Rosetta suite [Korkegian, A. et al., Science, 2005, 308(5723):857-60]. This methodology successfully stabilized yeast cysteine deaminase. It is noted that both the mentioned methodologies have been used for relatively small proteins that are generally stable having a wild-type Tm of above 50° C. In addition, both these studies examined each of the individual results and hand-picked selected sub-sets of mutations for in-vitro experiments. In both methods less than 10 mutations were introduced at once.
Additional background art include U.S. Pat. Nos. 4,908,773 and 7,037,894 and U.S. Patent Application Nos. 20120171693 and 20130281314, which are incorporated herein by reference.