Various techniques of in silico and in vitro based directed evolution of protein function have allowed the generation of proteins with novel properties. For example, cytochrome P450 enzymes have been evolved to have activity against substrates not normally recognized by the naturally occurring enzyme (see, e.g., Landwehr et al., 2007, Chem Biol 14(3):269-78; Kubo et al., 2006, Chemistry 12(4):1216-20.). Typically, for generating such new enzymes, a polynucleotide encoding a reference polypeptide, such as a wild type enzyme, is subjected to mutagenesis to generate polynucleotides encoding polypeptide variants with changes in amino acid sequence. Screening of the variants for a desired property, such as an improvement in an enzyme stability or activity against new substrates, allows the identification of the amino acid residues associated with the changed property. However, not all combinations of the mutations will be present in the population of screened variants. For example, a mutation associated with thermal stability of an enzyme may not be found in association with a mutation associated with a change in substrate specificity. This bias in the population can arise from various factors, including, among others, the parental amino acid sequence encoded by the polynucleotide used for mutagenesis, possible selection against the combination during in vivo propagation of the polynucleotide, and the bias in the technique used for mutagenesis (e.g., use of polymerases to introduce errors).
Because the mutations at defined amino acid residue positions of a reference polypeptide sequence can provide a wealth of information about the polypeptide's biological activities, once mutations have been initially identified, it is desirable to prepare various combinations of the mutations not found in the initial set of screened variants that can be tested for the desired property. In silico based selection of defined mutations or sets of mutations provide a framework for generating a large number of possible mutation combinations. For example, mutations affecting substrate specificity can be combined with mutations affecting other enzyme properties, including, among others, enzyme activity, thermal stability, and inhibitor resistance. Typically, the approach to generating these polypeptides having novel combinations of mutations is to synthesize individual species (i.e., synthesis of each polynucleotide encoding the mutant gene). This can be accomplished by chemical and/or enzymatic synthesis of the polynucleotide in combination with standard recombination techniques. Such de novo synthesis techniques require whole gene synthesis of each polynucleotide variant and/or synthesis of large numbers of oligonucleotide primers which are then used to synthesize whole polynucleotide variant (e.g., via ION-PCR). These techniques require more oligonucleotide syntheses and result in lower yields of variants having the correct sequence. Consequently, if the data set of mutations is large, the cost and efficiency of generating the mutation combinations can limit the ability to screen a large number of novel combinations. Thus, efficient and cost effective methods of generating polynucleotides encoding combinations of defined mutations are desirable.