Directed molecular evolution can be used to create proteins and enzymes with novel functions and properties. Starting with a known natural protein, several rounds of mutagenesis, functional screening, and propagation of successful sequences are performed. The advantage of this process is that it can be used to rapidly evolve any protein without knowledge of its structure. Several different mutagenesis strategies exist, including point mutagenesis by error-prone PCR, cassette mutagenesis, and DNA shuffling. These techniques have had many successes; however, they are all handicapped by their inability to produce more than a tiny fraction of the potential changes. For example, there are 20500 possible amino acid changes for an average protein approximately 500 amino acids long. Clearly, the mutagenesis and functional screening of so many mutants is impossible; directed evolution provides a very sparse sampling of the possible sequences and hence examines only a small portion of possible improved proteins, typically point mutants or recombinations of existing sequences. By sampling randomly from the vast number of possible sequences, directed evolution is unbiased and broadly applicable, but inherently inefficient because it ignores all structural and biophysical knowledge of proteins.
In contrast, computational methods can be used to screen enormous sequence libraries (up to 1080 in a single calculation) overcoming the key limitation of experimental library screening methods such as directed molecular evolution. There are a wide variety of methods known for generating and evaluating sequences. These include, but are not limited to, sequence profiling (Bowie and Eisenberg, Science 253(5016): 164-70, (1991)), rotamer library selections (Dahiyat and Mayo, Protein Sci 5(5): 895-903 (1996); Dahiyat and Mayo, Science 278(5335): 82-7 (1997); Desjarlais and Handel, Pro Science 4: 2006-2018 (1995); Harbury et al, PNAS USA 92(18): 8408-8412 (1995); Kono et al., Proteins: Structure, Function and Genetics 19: 244-255 (1994); Hellinga and Richards, PNAS USA91 5803-5807 (1994)); and residue pair potentials (Jones, Protein Science 3: 567-574, (1994)).
In particular, U.S. Ser. Nos. 60/061,097, 60/043,464, 60/054,678, 09/127,926, now U.S. Pat. No. 6,269,312 and PCT US98/07254 describe a method termed “Protein Design Automation”, or PDA, that utilizes a number of scoring functions to evaluate sequence stability.
It is an object of the present invention to provide computational methods for prescreening sequence libraries to generate and select secondary libraries, which can then be made and evaluated experimentally.