Current methods for creating mutant proteins in a library format include methods for generating random libraries such as error-prone polymerase chain reaction (PCR) (primer or polymerase based), error prone cell lines, combinatorial libraries or libraries limited to a smaller number of positions or a subset of amino acids.
Random mutagenesis involves randomly distributing mutations throughout the length of the parental sequence. The most commonly used random mutagenesis method is error-prone PCR, which introduces random mutations during PCR by reducing the fidelity of DNA polymerase. Though random mutagenesis methods are relatively inexpensive and easy to set up in the laboratory, only sequence diversity adjacent to the parental sequences is identified by random methods. For example, amino acids changes which require 2 or 3 base pair changes occur less frequently in the population than those requiring a single change, resulting in incomplete coverage or else requiring much larger libraries to get complete coverage of all possible changes. Further, due to redundancies in the codon representation (i.e., 64 codons for 20 amino acids), amino acids with larger codon sets mutate less often, resulting in biased mutational frequencies when this method is used.
Combinatorial libraries involve the synthesis and display of a large number of molecules (see, e.g., Rajpal et al., Proc. Nat'l Acad. Sci USA 102:8466-8471 (2005); see also U.S. Pat. No. 5,798,208 and US 2006/0024308). Such a library can consist of thousands to millions of compounds or proteins. For example, for a small 50 residue protein, 2050 different designs are possible. Thus, combinatorial libraries can be difficult to construct and screen for proteins with the desired characteristics.
Other commonly used methods are limited to mutations at a small number of positions or to a subset of amino acids. However, desirable mutations can be overlooked when such methods are applied.
The goal of a simple, non labor intensive method for determining the effect of all possible amino acid substitutions for each amino acid in a region or domain of a protein is well appreciated in the field, but only limited progress has been made to date. The most significant effort so far is that of Pal et al. (J. Biol. Chem. 281:22378-22388 (2006)), called “Quantitative Saturation Scanning Mutagenesis.” This method uses multiple, very large combinatorial bacteriophage displayed libraries to completely scan a protein/protein interface. The design and construction of the libraries requires some prior knowledge of the interface, as well as molecular biology skills beyond the state of the art in most labs.
The compositions and methods provided herein address these and other needs by providing methods to design, construct, and screen protein libraries of a manageable size such that the effect of every possible mutation in the library is determined. A single library can be used to examine the effect of substitution of all twenty amino acids at each position in a domain or region of a protein, or even the entire protein. The library is small, easy to assemble and requires no prior knowledge of the protein/protein interface.