Protein design has long been known to be a difficult task if for no other reason than the combinatorial explosion of possible molecules that constitute searchable sequence space. The protein design problem was recently shown to belong to a class of problems known as NP-hard (Pierce, et al. (2002) “Protein Design is NP-hard,” Prot. Eng. 15(10):779-782), indicating that there is no algorithm known that can solve such problems in polynomial time. Because of this complexity, many approximate methods have been used to design better proteins; chief among them is the method of directed evolution. Directed evolution of proteins is today dominated by various high throughput screening and recombination formats, often performed iteratively.
Sequence space can be described as a space where all possible protein neighbors can be obtained by a series of single point mutations. Smith (1970) “Natural selection and the concept of a protein space,” Nature, 225(232):563-4. For example, a 100 residue long protein would be a 100 dimensional object with 20 possible values, i.e., the 20 naturally occurring amino acids, in each dimension. Each one of these proteins has a corresponding fitness on some complex landscape. Models of such “fitness landscapes” were first studied by Sewall Wright (Wright (1932) “The roles of mutation, inbreeding, crossbreeding and selection in evolution,” Proceedings of 6th International Conference on Genetics, 1:356-366) but have since been expanded on by others (Eigen, M. (1971) “Self organization of matter and the evolution of biological macromolecules,” Naturwissenschaften, 58(10):465-523; Kauffman, S. et al. (1987) “Towards a general theory of adaptive walks on rugged landscapes,” J. Theor. Biol., 128(1):11-45; Kauffman, E. S., et al. (1989) “The NK model of rugged fitness landscapes and its application to maturation of the immune response,” J. Theor. Biol., 141(2):211-45; Schuster, P., et al. (1994) “Landscapes: complex optirmization problems and biopolymer structures,” Comput. Chem., 18(3):295-324; Govindarajan, S. et al. (1997) “Evolution of model proteins on a foldability landscape,” Proteins, 29(4):461-6). The sequence space of proteins is immense and is impossible to explore exhaustively. Accordingly, new ways to efficiently search sequence space to identify functional proteins would be highly desirable.