Mutagenesis is a powerful tool in the study of protein structure and function. Mutations can be made in the nucleotide sequence of a cloned gene encoding a protein of interest and the modified gene can be expressed to produce mutants of the protein. By comparing the properties of a wild-type protein and the mutants generated, it is often possible to identify individual amino acids or domains of amino acids that are essential for the structural integrity and/or biochemical function of the protein, such as its binding and/or catalytic activity.
Mutagenesis, however, is beset by several limitations. Among these are the large number of mutants that can be generated and the practical inability to select from these, the mutants that will be informative or have a desired property. For instance, there is no reliable way to predict whether the substitution, deletion or insertion of a particular amino acid in a protein will have a local or global effect on the protein, and therefore, whether it will be likely to yield useful information or function.
Because of these limitations, attempts to improve properties of a protein by mutagenesis have relied mostly on the generation and analysis of mutations that are restricted to specific, putatively important regions of the protein, such as regions at or around the active site of the protein. But, even though mutations are restricted to certain regions of a protein, the number of potential mutations can be extremely large, making it difficult or impossible to identify and evaluate those produced. For example, substitution of a single amino acid position with all the other naturally occurring amino acids yields 19 different variants of a protein. If several positions are substituted at once, the number of variants increases exponentially. For substitution with all amino acids at seven amino acid positions of a protein, 19xc3x9719xc3x9719xc3x9719xc3x9719xc3x9719xc3x9719 or 8.9xc3x97108 variants of the protein are generated, from which useful mutants must be selected. It follows that, for an effective use of mutagenesis, the type and number of mutations must be subjected to some restrictive criteria which keep the number of mutant proteins generated to a number suitable for screening.
A method of mutagenesis that has been developed to produce very specific mutations in a protein is site-directed mutagenesis. The method is most useful for studying small sites known or suspected to be involved in a particular protein function. In this method, nucleotide substitutions (point mutations) are made at defined locations in a DNA sequence in order to bring about a desired substitution of one amino acid for another in the encoded amino acid sequence. The method is oligonucleotide-mediated. A synthetic oligonucleotide is constructed that is complementary to the DNA encoding the region of the protein where the mutation is to be made, but which bears an unmatched base(s) at the desired position(s) of the base substitution(s). The mutated oligonucleotide is used to prime the synthesis of a new DNA strand which incorporates the change(s) and, therefore, leads to the synthesis of the mutant gene. See Zoller, M. J. and Smith, M., Meth. Enzymol. 100, 468 (1983).
Variations of site-directed mutagenesis have been developed to optimize aspects of the procedure. For the most part, they are based on the original methods of Hutchinson, C. A. et al., J. Biol. Chem. 253:6551 (1978) and Razin, A. et al., Proc. Natl. Acad. Sci. USA 75:4268 (1978). For an extensive description of site-directed mutagenesis, see Molecular Cloning, A Laboratory Manual, 1989, Sambrook, Fritsch and Maniatis, Cold Spring Harbor, N.Y., chapter 15.
A method of mutagenesis designed to produce a larger number of mutations is the xe2x80x9csaturationxe2x80x9d mutagenesis. This process is oligonucleotide-mediated also. In this method, all possible point mutations (nucleotide substitutions) are made at one or more positions within DNA encoding a given region of a protein. These mutations are made by synthesizing a single mixture of oligonucleotides which is inserted into the gene in place of the natural segment of DNA encoding the region. At each step in the synthesis, the three non-wild type nucleotides are incorporated into the oligonucleotides along with the wild type nucleotide. The non-wild type nucleotides are incorporated at a predetermined percentage, so that all possible variations of the sequence are produced with anticipated frequency. In this way, all possible nucleotide substitutions are made within a defined region of a gene, resulting in the production of many mutant proteins in which the amino acids of a defined region vary randomly (Oliphant, A. R. et al., Meth, Enzymol. 155:568 (1987)).
Methods of random mutagenesis, such as saturation mutagenesis, are designed to compensate for the inability to predict where mutations should be made to yield useful information or functional mutants. The methods are based on the principle that, by generating all or a large number of the possible variants of relevant protein domains, the proper arrangement of amino acids is likely to be produced as one of the randomly generated mutants. However, for completely random combinations of mutations, the numbers of mutants generated can overwhelm the capacity to select meaningfully. In practice, the number of random mutations generated must be large enough to be likely to yield the desired mutations, but small enough so that the capacity of the selection system is not exceeded. This is not always possible given the size and complexity of most proteins.
This invention pertains to a method of mutagenesis for the generation of novel or improved proteins (or polypeptides) and to libraries of mutant proteins and specific mutant proteins generated by the method. The protein, peptide or polypeptide targeted for mutagenesis can be a natural, synthetic or engineered protein, peptide or polypeptide or a variant (e.g., a mutant). In one embodiment, the method comprises introducing a predetermined amino acid into each and every position in a predefined region (or several different regions) of the amino acid sequence of a protein. A protein library is generated which contains mutant proteins having the predetermined amino acid in one or more positions in the region and, collectively, in every position in the region. The method can be referred to as xe2x80x9cwalk-throughxe2x80x9d mutagenesis because, in effect, a single, predetermined amino acid is substituted position-by-position throughout a defined region of a protein. This allows for a systematic evaluation of the role of a specific amino acid in the structure or function of a protein.
The library of mutant proteins can be generated by synthesizing a single mixture of oligonucleotides which encodes all of the designed variations of the amino acid sequence for the region containing the predetermined amino acid. This mixture of oligonucleotides is synthesized by incorporating in each condensation step of the synthesis both the nucleotide of the sequence to be mutagenized (for example, the wild type sequence) and the nucleotide required for the codon of the predetermined amino acid. Where a nucleotide of the sequence to be mutagenized is the same as a nucleotide for the predetermined amino acid, no additional nucleotide is added. In the resulting mixture, oligonucleotides which contain at least one codon for the predetermined amino acid make up from about 12.5% to 100% of the constituents. In addition, the mixture of oligonucleotides encodes a statistical (in some cases Gaussian) distribution of amino acid sequences containing the predetermined amino acid in a range of no positions to all positions in the sequence.
The mixture of oligonucleotides is inserted into a gene encoding the protein to be mutagenized (such as the wild type protein) in place of the DNA encoding the region. The recombinant mutant genes are cloned in a suitable expression vector to provide an expression library of mutant proteins that can be screened for proteins that have desired properties. The library of mutant proteins produced by this oligonucleotide-mediated procedure contains a larger ratio of informative mutants (those containing the predetermined amino acid in the defined region) relative to noninformative mutants than libraries produced by methods of saturation mutagenesis. For example, preferred libraries are made up of mutants which have the predetermined amino acid in essentially each and every position in the region at a frequency ranging from about 12.5% to 100%.
This method of mutagenesis can be used to generate libraries of mutant proteins which are of a practical size for screening. The method can be used to study the role of specific amino acids in protein structure and function and to develop new or improved proteins and polypeptides such as enzymes, antibodies, binding fragments or analogues thereof, single chain antibodies and catalytic antibodies.