The sequences of genes encoding many important proteins have been determined at a rapid speed owing to the fast progress in the field of genomics. The three-dimensional structures of thousands of proteins have been determined by X-ray crystallography and other biophysical and biochemical methods, and many more polypeptide sequences critical for the biological function of the proteins have also been determined. However, to a large extent, the correlation between protein primary sequence, tertiary structure, and biological function remains elusive.
Proteins can generally tolerate a certain level of amino acid substitutions without severe consequences on folding or stability (Axe et al., (1996) Proch. Natl. Acad. Sci. U S A 93:5590-5594; Bowie et al., (1990) Science 247:1306-1310; Gassner et al. (1996) Proc. Natl. Acad. Sci. U S A 93:12155-12158; Baldisseri et al. (1991) Biochem. 30:3628-33; Huang et al. (1996) J. Mol. Biol. 258:688-703.; Rennel et al. (1991) J. Mol. Biol. 222:67-88; Shortle (1995) Curr. Opin. Biotechnol. 6:387-393). On the other hand, for many proteins, a single particular residue can be either critical to function and/or stability (Philippon et al. (1998) Cell Mol. Life Sci. 54:341-346). Although it is desirable to be able to predict protein folding pattern from its primary sequence and to correlate its structure with function in vivo, in reality, this has proven to be a formidable task.
One approach to studying protein structure and function is site-directed mutagenesis. It is an important, but cumbersome approach to compiling an overall picture of protein functional character, let alone stability and regulatory characteristics in vivo. For example, serine beta-lactamases have been found to exhibit very diverse primary structures and catalytic profiles, but almost all of the known three-dimensional structures for serine beta-lactamases exhibit a high degree of similarity with apparently equivalent chemical functionalities in the same strategic positions (Philippon et al. (1998) Cell Mol. Life Sci. 54:341-346).
The apparent complexity of macromolecular structure-function correlation has made random mutagenesis an attractive approach to redesigning proteins. Many of the random mutagenesis methods developed so far are designed to introduce random base-pair substitutions.
Methods of saturation mutagenesis utilizing random or partially degenerate primers that incorporate restriction sites have been described (Hill et al. (1987) Methods Enzymol. 155:558-568; Reidhaar-Olson et al. (1991) Methods Enzymol. 208:564-586; Oliphant et al. (1986) Gene 44:177-183).
Error-prone polymerase chain reaction is another methodology for randomly mutating genes by altering the concentrations of respective dNTP's in the presence of dITP (Leung, S. et al. (1989) Nucleic Acid Res. 17:1177-1195); Caldwell and Joyce (1992) In PCR Methods Application 2:28-33; Spee et al. (1993) Nucleic Acid Res. 21: 777-778).
"Cassette" mutagenesis is another method for creating libraries of mutant proteins (Huebner et al. (1988) Gene 73:319-325; Hill et al. (1987) Methods Enzymol. 155:558-568; Shiraishi and Shimura (1988) Gene 64:313-319; U.S. Pat. Nos. 5,830,720; 5,830,721; 5,830,722; 5,830,728; 5,830,740; 5,830,741; and 5,830,742). Cassette mutagenesis typically replaces a sequence block length of a template with a partially randomized sequence. The maximum information content that can be obtained is thus limited statistically to the number of random sequences in the randomized portion of the cassette.
A protocol has also been developed by which synthesis of an oligonucleotide is "doped" with non-native phosphoramidites, resulting in randomization of the gene section targeted for random mutagenesis (Wang and Hoover (1997) J. Bacteriol. 179: 5812-5819). This method allows control of position selection, while retaining a random substitution rate.
Zaccolo and Gherardi (1999) describe a method of random mutagenesis utilizing pyrimidine and purine nucleoside analogs (Zaccolo and Gherardi (1999) J. Mol. Biol. 285: 775-783). This method was successful in achieving substitution mutations which rendered a .beta.-lactamase with an increased catalytic rate against the cephalosporin cefotaxime. Crea describes a "walk through" method, wherein a predetermined amino acid is introduced into a targeted sequence at pre-selected positions (U.S. Pat. No. 5,798,208).
Methods for mutating a target gene by insertion and/or deletion mutations have also been developed. It has been demonstrated that insertion mutations could be accommodated in the interior of staphylococcal nuclease (Keefe et al. (1994) Protein Sci. 3:391-401). Another insertional mutagenesis method involves a partial fragmentation by a high frequency cutting restriction endonuclease, phosphatasing, and circularizing by appropriate linkers (Fitzgerald et al. (1994) Protein Sci. 3:391-401). Examples of deletional mutagenesis methods developed include the utilization of an exonuclease (such as exonuclease III or Bal31) or through oligonucleotide directed deletions incorporating point deletions (Ner et al. (1989) Nucleic Acids Res. 17:4015-4023).
Methods have also been developed to create molecular libraries as a part of the process of engineering the evolution of molecules with desired characteristics. Termed "directed evolution" or some variant thereof, protocols describing this type of technology typically involve the reassembly of fragments of DNA, representing a "shuffled" pool; in effect, accelerating the recombinatorial process that leads to molecules with desired and/or enhanced characteristics (Stemmer (1994) Nature 370: 389-391; Zhang et al. (1997) Proc. Natl. Acad. Sci. 94: 4504-4509). Such "directed molecular evolution" approaches have been utilized to mutagenize enzymes (Gulik &Fahl (1995) Proc. Natl. Acad. Sci. USA 92: 8140-8144; Stemmer (1994) Nature 370: 389-391; You & Arnold (1996) Protein Eng. 9:77-83; Zhang et al. (1997) Proc. Natl. Acad. Sci. USA. 94:4504-4509), antibodies (Barbas et al. (1994) Proc. Natl. Acad. Sci. USA. 91: 3809-3813; Crameri et al. (1997) Nature Biotech. 15:436-438.), fluorescent proteins (Heim & Tsien (1996) Curr. Biol. 6:178-182.; Siemering et al. (1996) Curr. Biol. 6:1653-1663). and entire operons (Crameri et al. (1996) Nature Med. 2: 100-102).