DNA shuffling is a powerful tool for obtaining recombinants between two or more DNA sequences to evolve them in an accelerated manner. The parental, or input, DNAs for the process of DNA shuffling are typically mutants or variants of a given gene that have some improved character over the wild-type. The products of DNA shuffling represent a pool of essentially random reassortments of gene sequences from the parental nucleic acids that can then be analyzed for additive or synergistic effects resulting from new sequence combinations.
Recursive sequence reassortment is analogous to an evolutionary process where only variants with suitable properties are allowed to contribute their genetic material to the production of the next generation. Optimized variants are generated through DNA shuffling-mediated sequence reassortment followed by testing for incremental improvements in performance. Additional cycles of reassortment and testing lead to the generation of genes that contain new combinations of the genetic improvements identified in previous rounds of the process. Reasserting and combining beneficial genetic changes allows an optimized sequence to arise without having to individually generate and screen all possible sequence combinations.
Shuffling differs sharply from random mutagenesis, where subsequent improvements to an already improved sequence result largely from serendipity. For example, in order to obtain a protein that has a desired set of enhanced properties, it may be necessary to identify a mutant that contains a combination of various beneficial mutations. If no process is available for combining these beneficial genetic changes, further random mutagenesis will be required. However, random mutagenesis requires repeated cycles of generating and screening large numbers of mutants, resulting in a process that is tedious and highly labor intensive. Moreover, the rate at which sequences incur mutations with undesirable effects increases with the information content of a sequence. Hence, as the information content, library size, and mutagenesis rate increase, the ratio of deleterious mutations to beneficial mutations will increase, increasingly masking the selection of further improvements. Lastly, some computer simulations have suggested that point mutagenesis alone may often be too gradual to allow the large-scale block changes that are required for continued and dramatic sequence evolution.
Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a sequence. A limitation to this method, however, is that published error-prone PCR protocols suffer from a low processivity of the polymerase, making this approach inefficient at producing random mutagenesis in an average-sized gene.
In oligonucleotide-directed random mutagenesis, a short sequence is replaced with a synthetically mutagenized oligonucleotide. To generate combinations of distant mutations, different sites must be addressed simultaneously by different oligonucleotides. The limited library size that is obtained in this manner, relative to the library size required to saturate all sites, requires that many rounds of selection are required for optimization. Mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round followed by grouping them into families, arbitrarily choosing a single family, and reducing it to a consensus motif. Such a motif is resynthesized and reinserted into a single gene followed by additional selection. This step creates a statistical bottleneck, is labor intensive, and is not practical for many rounds of mutagenesis.
For these reasons, error-prone PCR and oligonucleotide-directed mutagenesis can be used for mutagenesis protocols that require relatively few cycles of sequence alteration, such as for sequence fine-tuning, but are limited in their usefulness for procedures requiring numerous mutagenesis and selection cycles, especially on large gene sequences.
As discussed above, prior methods for producing improved gene products from randomly mutated genes are of limited utility. One recognized method for producing a randomly reasserted gene sequences uses enzymes to cleave a long nucleotide chain into shorter pieces. The cleaving agents are then separated from the genetic material, and the material is amplified in such a manner that the genetic material is allowed to reassemble as chains of polynucleotides, where their reassembly is either random or according to a specific order. The method requires several rounds of amplification to assemble variants of genes that were broken into random fragments. ((Stemmer, 1994a; Stemmer, 1994b), U.S. Pat. No. 5,605,793, U.S. Pat. No. 5,811,238, U.S. Pat. No. 5,830,721, U.S. Pat. No. 5,928,905, U.S. Pat. No. 6,096,548, U.S. Pat. No. 6,117,679, U.S. Pat. No. 6,165,793, U.S. Pat. No. 6,153,410). A variation of this method uses primers and limited polymerase extensions to generate the fragments prior to reassembly (U.S. Pat. No. 5,965,408, U.S. Pat. No. 6,159,687).
However, both methods have limitations. These methods suffer from being technically complex. This limits the applicability of these methods to facilities that have sufficiently experienced staffs. In addition there are complications that arise from the reassembly of molecules from fragments, including unintended mutagenesis and the increasing difficulty of the reassembly of large target molecules of increasing size, which limits the utility of these methods for reassembling long polynucleotide strands.
Another limitation of these methods of fragmentation and reassembly-based gene shuffling is encountered when the parental template polynucleotides are increasingly heterogeneous. In the annealing step of those processes, the small polynucleotide fragments depend upon stabilizing forces that result from base-pairing interactions to anneal properly. As the small regions of annealing have limited stabilizing forces due to their short length, annealing of highly complementary sequences is favored over more divergent sequences. In such instances these methods have a strong tendency to regenerate the parental template polynucleotides due to annealing of complementary single-strands from a particular parental template. Therefore, the parental templates essentially reassemble themselves creating a background of unchanged polynucleotides in the library that increases the difficulty of detecting recombinant molecules. This problem becomes increasingly severe as the parental templates become more heterogeneous, that is, as the percentage of sequence identity between the parental templates decreases. This outcome was demonstrated by Kikuchi, et al., (Gene 243:133–137, 2000) who attempted to generate recombinants between xylE and nahH using the methods of family shuffling reported by Patten et al., 1997; Crameri et al., 1998; Harayama, 1998; Kumamaru et al., 1998; Chang et al., 1999; Hansson et al., 1999). Kikuchi, et al., found that essentially no recombinants (<1%) were generated. They also disclosed a method to improve the formation of chimeric genes by fragmentation and reassembly of single-stranded DNAs. Using this method, they obtained chimeric genes at a rate of 14 percent, with the other 86 percent being parental sequences.
The characteristic of low-efficiency recovery of recombinants limits the utility of these methods for generating novel polynucleotides from parental templates with a lower percentage of sequence identity, that is, parental templates that are more diverse.
Accordingly, there is a need for a method of generating gene sequences that addresses these needs. A method has been developed for reasserting mutations among related polynucleotides, in vitro, by forming heteroduplex molecules and then addressing the mismatches such that sequence information at sites of mismatch is transferred from one strand to the other. The mismatches are addressed by incubating the heteroduplex molecules in a reaction containing a) an enzyme that recognizes and nicks a sequence strand at a mismatch site, b) a polymerase with a proofreading activity in the presence of dNTPs, and c) a ligase. These respective activities act in concert such that, at a given site of mismatch, the heteroduplex is nicked, unpaired bases are excised from one of the strands, then replaced using the opposite strand as a template, and nicks are sealed. Output polynucleotides may be amplified before cloning, or cloned directly and tested for improved properties. Additional cycles of mismatch resolution reassortment and testing may lead to further improvement.
This method utilizes a mismatch endonuclease that is capable of recognizing and nicking at the site of a mismatch between a base or a sequence of bases along opposite strands of a nucleic acid sequence.
To address the need for enzymes that will recognize a mismatch, we have cloned the gene for the CEL I enzyme and a novel enzyme we refer to as RES I. Both of these enzymes are mismatch endonucleases and both are particularly suited to recognizing a base pair mismatch along a nucleic acid sequence, such as a chromosome, a plasmid, a gene, a portion of a gene or any artificial sequence of nucleic acids.