There has been considerable recent success in the development of computational methods for the design of protein sequences, at various degrees of sophistication. Several groups have presented results in which computer algorithms were used to design novel hydrophobic cores for proteins (Dahiyat & Mayo, 1996; Dahiyat & Mayo, 1997b; Desjarlais & Handel, 1995; Hellinga & Richards, 1994; Kono & Doi, 1994; Lazar et al., 1997), in many cases with experimental validation of the proteins by biophysical and/or structural methods (Dahiyat & Mayo, 1996; Dahiyat & Mayo, 1997b; Desjarlais & Handel, 1995; Johnson et al., 1999; Kono et al., 1998; Lazar et al., 1997; Lazar et al., 1999).
Mayo and colleagues have pioneered the development of algorithms for non-core (Dahiyat et al., 1997a) and full sequence design (Dahiyat & Mayo, 1997a; Dahiyat et al., 1997b), using parameterized force fields and sophisticated optimization methods such as the Dead-End Elimination (DEE) theory (Desmet et al., 1992; Goldstein, 1994). These methods were used successfully to design a sequence that adopts the zinc finger fold with no requirement for zinc binding (Dahiyat & Mayo, 1997a). The force fields used for these design processes have been parameterized over time by comparison between the calculated and experimentally determined folding stabilities of the designed proteins, a process referred to as the design cycle (Dahiyat & Mayo, 1996; Gordon et al., 1999; Hellinga, 1997; Street & Mayo, 1999). A patent related to these studies is U.S. Pat. No. 6,188,965, incorporation herein by way of reference.
A significant limitation (and criticism) of extant protein design methodologies is a lack of a generally applicable method for incorporating backbone flexibility into the design simulation. Although some efforts along these lines have been explored (Desjarlais & Handel, 1999; Harbury et al., 1995; Su & Mayo, 1997), they are limited in scope.
A second limitation in many design methods is that they do not provide a comprehensive measure of the sequence space that is consistent with a three-dimensional protein fold. In this context, sequence space means all sequential combinations of amino acids that can spontaneously fold into the target three-dimensional structure. Knowledge of the viable sequence space is a crucial feature of the ability to rationally design protein combinatorial libraries that can be used to search for proteins with improved properties. Again, some efforts along these lines have been pursued, for instance by designing multiple sequences using heuristic (Monte Carlo or genetic algorithm) methods (Dahiyat et al., 1997b; Desjarlais & Handel, 1995; Kuhlman & Baker, 2000). Such methods serve to partially explore the sequence space of a fold, but do not necessarily yield quantitatively robust information. Application of the self-consistent mean field methods (Delarue & Koehl, 1997; Koehl & Delarue, 1994; Lee, 1994) has some promise for exploring sequence space (Voigt et al., 2001), but this class of methods have significant limitations that call into question their ability to fully explore the appropriate space (Voigt et al., 2000). Furthermore, this method has not yet been demonstrated to yield physically viable designed proteins.
In view of the previous discussion of demands and limitations in the field of protein design, it can be seen that there is a need to improve protein design and evaluation methodology. Accordingly, it is an object of the invention to provide a computational protein design procedure that is capable of incorporating backbone flexibility in a general way and is capable of providing a superior exploration of the amino acid sequence space consistent with a protein structural state. Another object of the invention is to provide a novel approach to the evaluation and parameterization of protein design algorithms that is more efficient than efforts that rely on feedback from experimental stability data alone. Yet another object of the invention is to provide a method of analysis of the ability of protein design algorithms to design amino acid sequences that are similar to those that exist naturally for a given protein class. These and other objects and advantages of the invention and equivalents thereof, are described and provided in the drawings and descriptions that follow and manifest in the appended claims.