Depending on the primary structure and the environment, proteins fold into a three-dimensional (3D) structure containing recurring motives which pack together to form the 3D structure, the most common motives observed being the α-helix, β-turn, parallel and anti-parallel β-sheets.
The 3D structure of a protein may be characterized as having internal surfaces being the areas buried within the structure and thus directed away from the aqueous environment in which the protein is normally found; external surfaces being the areas exposed to the aqueous environment and intermediate or boundary surfaces. Through the study of many natural proteins, researches have discovered that hydrophobic residues are most frequently found on the internal surface of water soluble protein molecules while hydrophilic residues are most frequently found on the external protein surfaces.
It was established that while the biological properties of a protein depend directly on the protein's 3D conformation, only some of the information in the protein's sequence is necessary to specify its fold, i.e. a given native structure may be formed from many different sequences [Lau K. F. and Dill K. A. PNAS USA 87:638-652 (1990)]. The different sequences compatible with a given 3D structure are referred to as the structure's Sequence Space. The finding that a number of amino acid sequences may fold into the same basic 3D structure, have focused attention on a new field commonly referred to as the “inverse protein folding” or “de novo protein design”. While conventional protein folding methods are trying to predict the tertiary structure of a protein from their amino acids sequence, protein design methods are looking for a sequence that will stabilize a given fold, by using the same principals.
Reports of experimentally predicted amino acid sequences which adopt an intended fold and possess physical properties similar at least in part to those of natural proteins are appearing with increasing frequency [Kortemme T. et al. Science 281:253-256 (1998); Kurda Y. et al. J. Mol. Biol. 236:862-868 (1994); Quinn T. P. et al PNAS USA 91:8747-8751 (1994); Fezoui, Y. et al PNAS USA 91:3675-3679 (1994); Betz S. F et al Curr. Opin. Struc. Biol. 5:457-463 (1995); Raleigh D. P. et al J. Am. Chem. Soc. 117:755-7559 (1995); Regan L. & DeGardo W. F. Science 241:976-978 (1988); Hecht M. H. et al. Science 249:884-891; Beauregard M. et al. Protein Eng. 4:745-749 (1991) Kamtekar S. et al Science 262:1680-1685 (1993)]. These studies have been predominantly experimental and rely on knowledge of the physical properties that determine the protein's structure, such as the patterns of hydrophobic and hydrophilic residues in the sequence.
Several groups have applied an experimentally tested systematic, quantitative methods to protein design with the goal of developing general design algorithms. Desjarlais and Handel (1) were the first to experimentally investigate predictions generated by genetic algorithms (GA). They have developed ROC (“Repacking of Cores”), a computational program that attempts to find novel core sequences given the backbone structure of the protein of interest. In different, however related, work, a modification of the ROC was used on the secondary structure of the αβ protein ubiquitin(2). The program used a genetic algorithm to optimize the search for alternative core structures for a given protein. Other experimentally tested methods applied with respect to protein design are described elsewhere(3-11). Thus, in some cases, uniquely folded and even functional globular proteins may be obtained using highly simplified minimally designed cores. The algorithms consider the spatial positioning and steric complement of side chains by explicitly modeling the atoms of sequences under consideration. However, despite the success of these studies, a full predictive understanding of hydrophobic core packing in proteins has not yet been fully realized, and de novo design of stable and unique proteins, remains a challenging problem.
A major breakthrough was achieved by the Dead-End Elimination (DEE) algorithm by Desmet et al.(12), which was originally developed for homology modeling. DEE finds and eliminates rotamers that are mathematically provable to be inconsistent (or dead ending) with the global minimum energy solution of the system.
Dahiyat and Mayo(12) further adapted the algorithm by Desmet for the explicit exploration of sequence space using semi-empirical potential functions and stereochemical constraints, which intended to capture most of the known contributions of protein stability. In their design strategy they succeeded in expanding the range of computational protein design to residues of all parts of the protein: the buried core, the solvent-exposed surface, and the boundary between core and surface.