Over the past 100 years, intensive research efforts have been directed to elucidating and understanding the structures and functions of biopolymers, and biopolymer structure determination and structural information continue to play a key role in many major scientific areas, including biochemistry, molecular biology, biophysics, genomics, proteomics, and structural biology. Rapid progress has been made in developing powerful and refined tools, including x-ray crystallography and nuclear magnetic resonance, to rapidly determine the atomic-level structure of small molecules and of many classes of biopolymers, including globular proteins that perform critical catalytic, signaling, and transport functions within living cells. As a greater number of 3-dimensional globular protein structures have become available, structural biologists have begun to determine many of the principals that determine, and structural motifs common to, globular protein conformations in various chemical environments, including aqueous solutions, concentrated, complex solutions found in cells, and environments within protein crystals.
Because not all known proteins are amenable to structure determination by currently available technologies, and because enormous amounts of protein-sequence data are becoming available through the rapid determination of genome sequences, a large effort is underway to develop methods for accurately predicting 3-dimensional protein structures based on the amino-acid sequence of proteins. Although progress has been made, methods for accurately predicting 3-dimensional protein structure over a wide variety of protein types and sizes have not yet been developed. More recently, efforts have been undertaken to develop computational methods for designing the amino-acid sequences of artificial proteins, and for modifying the amino-acid sequences of naturally occurring proteins, in order to produce artificial protein molecules and modified naturally occurring protein molecules with initially specified 3-dimensional conformations. There are many uses for such methods. A number of uses involve designing and testing small sequences in order to further elucidate the structural motifs of, and sequence-dependent effects on, naturally occurring 3-dimensional protein structures. The ability to design proteins with specific 3-dimensional conformations may also facilitate elucidation of various aspects of enzyme catalysis, allosteric regulation of enzymes, and various types of conformation-related associations between proteins and between proteins and other types of biopolymers. Methods for designing sequences to produce specified, stable 3-dimensional protein conformations may also find wide applicability in the development of protein catalysts and signaling and binding molecules that may find use in diagnostics, therapeutics, nanotechnology, and other areas related to medicine, molecular electronics, and materials science.
While, in general, the amino-acid sequence of a globular protein determines the stable conformation or conformations of the protein in aqueous solution and in the complex environments of living cells, it is a computationally difficult problem to predict those stable conformations from the amino-acid sequence alone. One brute force approach is to evaluate every possible 3-dimensional conformation for a polypeptide having a particular amino-acid sequence, compute the free energy for each possible 3-dimensional conformation, and select, as the predicted 3-dimensional conformation or conformations, one or several of the lowest free-energy conformations. Unfortunately, the conformation state space is enormous. Even for relatively small polypeptides, there are far more possible conformations than there are elementary particles in the known universe. A trial-and-error method for designing a protein with a specific, target 3-dimensional conformation, in which the 3-dimensional conformations of a large number of polypeptides with different amino-acid sequences are predicted, and then compared to the target 3-dimensional conformation, involves searching a sequence/conformation state space of far greater size than that of a conformation state space. There are, for example, 20n different possible amino-acid sequences for a polypeptide composed of a linear sequence of length n amino acids of some or all of the 20 commonly encountered amino acids. For this reason, a directed, constrained search for polypeptides of specific amino-acid sequences that adopt specified 3-dimensional conformations is needed. Current methods for predicting conformation changes resulting from amino acid substitutions in naturally occurring proteins with determined 3-dimensional structures is an example of a highly constrained search.
The problem of polypeptide sequence design to produce an initially specified, 3-dimensional conformation falls into the larger class of problems that involve mathematical optimization of multi-variable functions, and, in particular, mathematical optimization in high dimensional problem domains. Computer scientists and mathematicians seek general optimization techniques for determining optimal solutions for systems described by functions with large numbers of variables, and research scientists and technologists continue to seek methods for addressing the problem of designing amino-acid sequences in order to produce polypeptides that adopt specified, desired 3-dimensional conformations in solution.