1. Field of the Invention
The invention relates generally to molecular biology and more specifically to methods of generating populations of related nucleic acid molecules.
2. Background Information
DNA shuffling is a powerful tool for obtaining recombinants between two or more DNA sequences to evolve them in an accelerated manner. The parental, or input, DNAs for the process of DNA shuffling are typically mutants or variants of a given gene that have some improved character over the wild-type. The products of DNA shuffling represent a pool of essentially random reassortments of gene sequences from the parental DNAs that can then be analyzed for additive or synergistic effects resulting from new sequence combinations.
Recursive sequence reassortment is analogous to an evolutionary process where only variants with suitable properties are allowed to contribute their genetic material to the production of the next generation. Optimized variants are generated through DNA shuffling-mediated sequence reassortment followed by testing for incremental improvements in performance. Additional cycles of reassortment and testing lead to the generation of genes that contain new combinations of the genetic improvements identified in previous rounds of the process. Reassorting and combining beneficial genetic changes allows an optimized sequence to arise without having to individually generate and screen all possible sequence combinations.
This differs sharply from random mutagenesis, where subsequent improvements to an already improved sequence result largely from serendipity. For example, in order to obtain a protein that has a desired set of enhanced properties, it may be necessary to identify a mutant that contains a combination of various beneficial mutations. If no process is available for combining these beneficial genetic changes, further random mutagenesis will be required. However, random mutagenesis requires repeated cycles of generating and screening large numbers of mutants, resulting in a process that is tedious and highly labor intensive. Moreover, the rate at which sequences incur mutations with undesirable effects increases with the information content of a sequence. Hence, as the information content, library size, and mutagenesis rate increase, the ratio of deleterious mutations to beneficial mutations will increase, increasingly masking the selection of further improvements. Lastly, some computer simulations have suggested that point mutagenesis alone may often be too gradual to allow the large-scale block changes that are required for continued and dramatic sequence evolution.
There are a number of different techniques used for random mutagenesis. For example, one method utilizes error-prone polymerase chain reaction (PCR) for creating mutant genes in a library format, (Cadwell and Joyce, 1992; Gram et al., 1992). Another method is cassette mutagenesis. (Arkin and Youvan, 1992; Delagrave et al., 1993; Delagrave and Youvan, 1993; Goldman and Youvan, 1992; Hermes et al., 1990; Oliphant et al., 1986; Stemmer et al., 1993) in which the specific region to be optimized is replaced with a synthetically mutagenized oligonucleotide.
Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a sequence. A limitation to this method, however, is that published error-prone PCR protocols suffer from a low processivity of the polymerase, making this approach inefficient at producing random mutagenesis in an average-sized gene.
In oligonucleotide-directed random mutagenesis, a short sequence is replaced with a synthetically mutagenized oligonucleotide. To generate combinations of distant mutations, different sites must be addressed simultaneously by different oligonucleotides. The limited library size that is obtained in this way, relative to the library size required to saturate all sites, means that many rounds of selection are required for optimization. Mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round followed by grouping them into families, arbitrarily choosing a single family, and reducing it to a consensus motif. Such a motif is resynthesized and reinserted into a single gene followed by additional selection. This step creates a statistical bottleneck, is labor intensive, and is not practical for many rounds of mutagenesis.
For these reasons, error-prone PCR and oligonucleotide-directed mutagenesis can be used for mutagenesis protocols that require relatively few cycles of sequence alteration, such as for sequence fine-tuning, but are limited in their usefulness for procedures requiring numerous mutagenesis and selection cycles, especially on large gene sequences.
As discussed above, prior methods for producing improved gene products from randomly mutated genes are of limited utility. One recognized method for producing a wide variety of randomly reasserted gene sequences uses enzymes to cleave a long nucleotide chain into shorter pieces. The cleaving agents are then separated from the genetic material, and the material is amplified in such a manner that the genetic material is allowed to reassemble as chains of polynucleotides, where their reassembly is either random or according to a specific order. ((Stemmer, 1994a; Stemmer, 1994b), U.S. Pat. No. 5,605,793, U.S. Pat. No. 5,811,238, U.S. Pat. No. 5,830,721, U.S. Pat. No. 5,928,905, U.S. Pat. No. 6,096,548, U.S. Pat. No. 6,117,679, U.S. Pat. No. 6,165,793, U.S. Pat. No. 6,153,410). A variation of this method uses primers and limited polymerase extensions to generate the fragments prior to reassembly (U.S. Pat. No. 5,965,408, U.S. Pat. No. 6,159,687).
However, both methods have limitations. These methods suffer from being technically complex. This limits the applicability of these methods to facilities that have sufficiently experienced staffs. In addition there are complications that arise from the reassembly of molecules from fragments, including unintended mutagenesis and the increasing difficulty of the reassembly of large target molecules of increasing size, which limits the utility of these methods for reassembling long polynucleotide strands.
Another limitation of these methods of fragmentation and reassembly-based gene shuffling is encountered when the parental template polynucleotides are increasingly heterogeneous. In the annealing step of those processes, the small polynucleotide fragments depend upon stabilizing forces that result from base-pairing interactions to anneal properly. As the small regions of annealing have limited stabilizing forces due to their short length, annealing of highly complementary sequences is favored over more divergent sequences. In such instances these methods have a strong tendency to regenerate the parental template polynucleotides due to annealing of complementary single-strands from a particular parental template. Therefore, the parental templates essentially reassemble themselves creating a background of unchanged polynucleotides in the library that increases the difficulty of detecting recombinant molecules. This problem becomes increasingly severe as the parental templates become more heterogeneous, that is, as the percentage of sequence identity between the parental templates decreases. This outcome was demonstrated by Kikuchi, et al., (Gene 243:133-137, 2000) who attempted to generate recombinants between xylE and nahH using the methods of family shuffling reported by Patten et al., 1997; Crameri et al., 1998; Harayama, 1998; Kumamaru et al., 1998; Chang et al., 1999; Hansson et al., 1999). Kikuchi, et al., found that essentially no recombinants (<1%) were generated. They also disclosed a method to improve the formation of chimeric genes by fragmentation and reassembly of single-stranded DNAs. Using this method, they obtained chimeric genes at a rate of 14 percent, with the other 86 percent being parental sequences.
The characteristic of low-efficiency recovery of recombinants limits the utility of these methods for generating novel polynucleotides from parental templates with a lower percentage of sequence identity, that is, parental templates that are more diverse. Accordingly, there is a need for a method of generating gene sequences that addresses these needs.
The present invention provides a method that satisfies the aforementioned needs, and also provides related advantages as well.