The invention relates to a method for generating a variant library of DNA sequences.
In the prior art, methods for generating certain individual variants of a DNA sequence are described, and reference can be made to, for example, L. Ge et al., Biotechniques, 22(1), 1997, 28-30; E. P. Weisberg et al., Biotechniques, 15(1), 1993, 68-76; R. M. Shayiq et al., Analytical Biochemistry, 221(1), 1994, 206-208; or W. Ito et al., Gene, 102(1), 1991, 67-70. In these methods, multiple mutations are introduced simultaneously into a DNA sequence. With PCR, individual fragments of a DNA sequence are generated and reconnected. Here, mutations are introduced via mismatch primers, and at each mutation site only one primer binds. However, this does not involve the generation of a variant library, but the creation of a certain individual variant of a DNA sequence.
Methods for generating a variant library of DNA sequences are likewise known in the prior art. In this regard, reference can be made to, for example, WO 01/12802, WO 02/34762, Wong T. S. et al., The Diversity Challenge in Directed Protein Evolution, Combinatorial Chemistry & High Throughput Screening, 9(4), 2006, 271-288; and C. Neylon, Nucleic Acids Research, 2004, 32(4), 1448-59.
WO 01/12802 and WO 02/34762 disclose methods for generating a variant library of DNA sequences. The methods use oligonucleotide mixtures and a certain type of PCR, such that at each mutation site only one primer binds. A division of the starting sequence into multiple sequence segments and amplification thereof are not carried out.
A further method known from the prior art is “DNA shuffling” (cf., for example, WO 98/32845, WO 98/58080, WO 2006/047669). Spatially separated segments are recombined with one another (shuffled). No mutations are introduced by oligonucleotides, and the oligonucleotides used also do not bind to mutation sites.
Proteins have a variety of applications in research and industry. Many technical applications for proteins make it necessary to adapt the properties of natural proteins to the particular requirements of the respective technical application. For this purpose, there are introduced into the proteins artificial modifications which achieve the desired improvement of a property of the protein. This approach is called “protein engineering”.
The structural clarification of native proteins has led in recent years to the availability of detailed information about sequence, structure, and structure-activity relationships for a very large number of proteins. Nevertheless, attempts to modify the properties of proteins by means of rational protein design are often not successful. Use is often made of statistical methods in which mutagenesis methods are used to generate a variant library which comprises a large number of protein variants, which is then investigated with regard to protein variants having improved properties.
As a result of the detailed information about the structure of the native proteins, it appears meaningful in many cases to not subject the wild-type sequence to completely arbitrary mutagenesis, but to limit the mutations to certain amino acid positions of the protein (focused mutagenesis). As a result, the theoretically possible complexity of the library can be restricted. Thus, for example, it is possible to identify in proteins amino acid positions which are responsible for a certain binding capability or, in the case of enzymes, for substrate recognition. But also for more global properties such as protein stability, particularly relevant amino acid positions can be identified.
Likewise, it is desirable to limit the mutations permitted at individual amino acid positions to certain amino acid substitutions. For example, the sought-after target property of the modified protein allows a meaningful restriction of the permitted amino acids. A very plausible restriction of the permitted amino acid positions at individual positions can likewise be deduced from the comprehensive sequence data in sequence databases. Via sequence alignments, it is possible to identify for individual amino acid positions the amino acids which naturally occurring proteins have at this site. Accordingly, the amino acid substitutions at these positions can then be restricted in the mutagenesis to the naturally occurring amino acids.
When generating focused variants, it is thus desirable to achieve a very high degree of control, not only with regard to the relative location of the modifications within the entire sequence, but also with regard to the number of modifications per entire sequence and the types of modifications.
In the prior art, a range of mutagenesis methods are known. One simple possibility consists in carrying out error-prone PCR, in which a polymerase incorporates nucleotides into the DNA sequence incorrectly during the DNA amplification and, as a result, generates mutations. A further possibility for introducing mutations into DNA sequences is DNA shuffling.
Mutagenesis methods which function with the aid of oligonucleotides are also described. In this regard, reference can be made to, for example, WO 02/34762 and WO 01/12802. In general, such mutagenesis methods, with the aid of oligonucleotides, can introduce mutations into DNA sequences via mismatch positions. Thus, for example, all 20 natural amino acids can be inserted at an amino acid position with the aid of a randomized oligonucleotide (saturation mutagenesis). A particular embodiment of this type of mutagenesis is quickchange mutagenesis, which can also be carried out as multiple quickchange mutagenesis. A similar method describes massive mutagenesis. With both methods, small to large numbers of oligonucleotides are used in the polymerase-mediated amplification of a DNA sequence to be mutated. A disadvantage of both methods is that it is not possible to control how many oligonucleotides are used per amplicon. Individual mutation sites can be overlooked by the polymerase. As a result, the number of mutations per variant is subject to strong variations and cannot be controlled. The theoretical total complexity of the library can, as a result, not be sufficiently restricted. A further disadvantage occurs when a number of oligonucleotides can bind to one mutation site. Differences which are caused by a variance in the affinity of the oligonucleotides for the mutation site cannot be balanced out. A further disadvantage of the methods is the fact that the average number of mutations per variant having an increasing number of mutation sites is increasingly smaller than the number of mutation sites. As a result, libraries in which multiple mutations are intended to be combined can be generated only at very large complexities in which the proportion of variants which have the desired combination of multiple mutations makes up only a small fraction of the library. There is a need for methods for generating a variant library of DNA sequences,                in which the mutations can be restricted to a number of exactly determined positions of the DNA sequence (or protein sequence),        in which the mutation sites can be distributed across the entire DNA sequence,        in which it can be established which mutations are permitted at which position, which can be carried out easily and with standard laboratory methods,        in which the distribution of individual mutations in the library can be controlled very exactly,        in which neighboring mutations as well can be introduced in a controlled manner,        in which the total number of mutations per variant can be controlled exactly, and        in which the complexity of the library generated can be restricted with precision.        
More particularly, there is a need for methods with which it is possible to generate variant libraries having a high proportion of variants in which multiple mutations are combined with one another.