Various protein engineering methods are known to those in the art. In general, proteins are modified in order to obtain desired protein properties. In most methods, the nucleotide sequence of a cloned gene encoding a protein is mutated and the modified gene is expressed to produce mutants, which are screened for activities of interest. Often, the mutant properties are compared with the properties of wild-type protein.
Historically, the protein design process has been approached as equivalent to the problem of finding in all of protein space the one best sequence for the desired application. This problem is extremely difficult and is “NP hard.” In complexity theory, problems defined as being in class P, are considered easy and efficient, polynomial-time algorithms exist for their solution. NP-hard problems are problems for which efficient polynomial-time algorithms are not currently known, and if any NP-hard problem could be solved, all NP-hard problems could be solved (See e.g., Pierce and Winfree, Protein Engineer., 15:779-782, 2002). Current strategies for building and screening libraries generally involve generating protein sequence diversity randomly across the whole sequence or in controlled random fashion at defined positions within the protein. These libraries generally have a large number of members that are “negative” with respect to the primary property of interest, and require large numbers be screened in order to find the relatively small numbers of positive mutations. Generally, negative mutations are ignored, and sequence information is only obtained for the positive members.
Saturation mutagenesis (Estell et al., in World Biotech Report 1984, vol. 2, USA, Online Publications, London, pp. 181-187, 1984; and Wells et al., Gene, 34:315-323, 1985) is one technique that can be used to search protein space for mutations that optimize several properties in a protein. Several groups have developed strategies for identifying sites to be changed by saturation mutagenesis (Reetz et al., Angew. Chem. Int Edn, 44:4192-4196, 2005; Kato et al., J Mol Biol, 351:683-692, 2005; and Sandberg et al., Proc Natl Acad Sci USA, 90:8367-8371, 1993), but no general system for site identification has been proposed.
In addition, because most protein engineering methods produce a great number of amino acid mutation options, screening of a large number of variants generally is required to produce a desired protein property. Generally, screening is repeated multiple times to produce a beneficial variant. Thus, most methods are laborious and time-consuming. There is a continuing need in the art for protein engineering methods that are efficient and produce the desired results.