De novo protein design has received considerable attention recently, and significant advances have been made toward the goal of producing stable, well-folded proteins with novel sequences. Efforts to design proteins rely on knowledge of the physical properties that determine protein structure, such as the patterns of hydrophobic and hydrophilic residues in the sequence, salt bridges and hydrogen bonds, and secondary structural preferences of amino acids. Various approaches to apply these principles have been attempted. For example, the construction of .alpha.-helical and .beta.-sheet proteins with native-like sequences was attempted by individually selecting the residue required at every position in the target fold (Hecht, et al., Science 249:884-891 (1990); Quinn, et al., Proc. Nati. Acad. Sci USA 91:8747-8751 (1994)). Alternatively, a minimalist approach was used to design helical proteins, where the simplest possible sequence believed to be consistent with the folded structure was generated (Regan, et al., Science 241:976-978 (1988); DeGrado, et al., Science 243:622-628 (1989); Handel, et al., Science 261:879-885 (1993)), with varying degrees of success. An experimental method that relies on the hydrophobic and polar (HP) pattern of a sequence was developed where a library of sequences with the correct pattern for a four helix bundle was generated by random mutagenesis (Kamtekar, et al., Science 262:1680-1685 (1993)). Among non de novo approaches, domains of naturally occurring proteins have been modified or coupled together to achieve a desired tertiary organization (Pessi, et al., Nature 362:367-369 (1993); Pomerantz, et al., Science 267:93-96 (1995)).
Though the correct secondary structure and overall tertiary organization seem to have been attained by several of the above techniques, many designed proteins appear to lack the structural specificity of native proteins. The complementary geometric arrangement of amino acids in the folded protein is the root of this specificity and is encoded in the sequence.
Several groups have applied and experimentally tested systematic, quantitative methods to protein design with the goal of developing general design algorithms (Hellinga, et al., J. Mol. Biol. 222: 763-785 (1991); Hurley, et al., J. Mol. Biol. 224:1143-1154 (1992); Desjarlaisl, et al., Protein Science 4:2006-2018 (1995); Harbury, et al., Proc. Natl. Acad. Sci. USA 92:8408-8412 (1995); Klemba, et al., Nat. Struc. Biol. 2:368-373 (1995); Nautiyal, et al., Biochemistry 34:11645-11651 (1995); Betzo, et al., Biochemistry 35:6955-6962 (1996); Dahiyat, et al., Protein Science 5:895-903 (1996); Jones, Protein Science 3:567-574 (1994); Konoi, et al., Proteins: Structure, Function and Genetics 19:244-255 (1994)). These algorithms consider the spatial positioning and steric complementarity of side chains by explicitly modeling the atoms of sequences under consideration. To date, such techniques have typically focused on designing the cores of proteins and have scored sequences with van der Waals and sometimes hydrophobic solvation potentials.
In addition, the qualitative nature of many design approaches has hampered the development of improved, second generation, proteins because there are no objective methods for learning from past design successes and failures.
Thus, it is an object of the invention to provide computational protein design and optimization via an objective, quantitative design technique implemented in connection with a general purpose computer.