1. Field of the Invention
This invention relates to the field of molecular biology and recombinant nucleic acid technology. Specifically, this invention pertains to a method for generating a pool of nucleic acid fragments useful for in vitro recombination and the creation of novel nucleic acid sequences that encode potentially desirable proteins or enzymes.
2. Description of Related Art
DNA sequence databases are growing exponentially with the submission of entire genome sequences. By inference, gene sequences create protein databases consisting of the amino acid sequences deduced from all the sequenced genes. DNA and protein sequence searches have provided information about the function and structure of proteins from novel gene sequences. Proteins that share sequence similarity are classified in “families”. The basis of these analyses is that sequence similarity can imply homology and common function.
Enzymes within a protein family may have similar catalytic activities, although unique characteristics of different enzymes may vary widely. Although several proteins in a family may have the same general function, conditions for optimal activity can be very different for each individual protein. The fundamental differences between enzymes are due to variations in their naturally evolved three-dimensional structure, which is ultimately determined by the linear amino acid sequence. Therefore, embedded within the sequences of proteins are functional folds, which in theory could exist, in numerous, yet undiscovered combinations, producing enzymes with different activities.
Many biotechnological processes exist for which there is a need for enzymes with increased stability, enhanced activity and new catalytic functions. Using the knowledge of previous structural and functional determinations, the current state of biotechnology does not allow one to design an enzyme de novo. To overcome this limitation, enzymes with new traits can be engineered by altering the structure of defined domains in natural proteins via specific mutagenesis or by random mutagenesis methods such as “gene shuffling”. An entire spectrum of mutational types are available through protein engineering, including single amino acid changes, multiple amino acid changes, segment replacements, whole domain swapping, and entire protein fusion.
The ability to make predetermined amino acid changes (i.e., site-directed mutagenesis) that will alter an enzyme's catalysis in a predictable manner requires extensive information about the enzymatic mechanism and those structural features of the protein which impart catalysis. The difficulties associated with rational mutagenesis for altering an enzyme's activity are in large part due to the unpredictable, balanced interactions among hundreds of amino acid side chains with each other, cofactors, water, substrate, and product. Therefore, significant changes in an enzyme's stability or activity are much more difficult to design through single mutations. Furthermore, when multiple substitutions are required, the number of possibilities is enormous, determined by the formula 20N where N is the number of amino acids in the protein, assuming that only the 20 commonly occurring amino acids are used.
Today, in vitro evolution methodologies may be used to alter an enzyme's structure. Using these types of methods numerous groups have engineered enzymes with altered or enhanced activities (Stemmer, 1994; Crameri et al., 1998; Moore & Arnold, 1996; Moore et al., 1997). Random alteration of gene sequences can be a powerful method for creating pools of proteins with different enzymatic capabilities. With appropriate assays, one can screen or select those enzymes with the desired activity. Current methods for creating such pools include error-prone PCR (Leung et al., 1989; Caldwell and Joyce, 1992), cassette mutagenesis (Arkin, A. & Youvan, D. C., 1992; Oliphant, A. R. et al., 1986), hybrid enzyme generation (Ostermeier et al., 1999a), in vivo recombination (Pompon & Nicolas, 1989), gene shuffling (Stemmer, 1994), and the Staggered Extension Process (StEP) (Zhao et al., 1998).
In error-prone PCR, altering the reaction conditions reduces the fidelity of the polymerase reaction. Typically this is accomplished by increasing the concentration of magnesium chloride, adding manganese chloride, increasing and unbalancing the dNTP concentrations, increasing the concentration of Taq polymerase, and/or increasing the extension time. The most error-prone conditions produce a 2% mutation rate per position and more typically about a 0.7% mutation rate per position. An advantage of error-prone PCR is that any gene fragment can be mutagenized. However, point mutations alone are thought to be too gradual for significant gene alterations and frequently result in neutral substitutions.
Cassette mutagenesis and domain swapping target defined regions of a protein. A cassette may be synthesized with a predetermined amount of degeneracy, from completely random to single amino acid change, in a defined length of the protein. Domain swapping refers to the creation of hybrid proteins having one or more domain from different proteins. If a protein domain is defined in a linear segment of amino acids, the domain may easily be inserted or substituted in other homologous proteins. Structural information is usually necessary for defining and swapping domains. A domain may be the active site of an enzyme. Transfer of active sites to homologous proteins may also lead to enzymes with new activities (Vita, C., 1997). Domain mutagenesis is not restricted to swapping of homologous domains, but also includes domain insertion to create multifunctional activities or control enzymes (Nixon et al., 1989). However, cassette mutagenesis is limited by the need to know sequence or domain boundaries and to the mutagenesis of a specific region, such as a region encoding a contiguous sequence of amino acids.
Domain swapping effectively performed without prior knowledge of domain boundaries has been described (Ostermeier et al., 1999a; Ostermeier et al., 1999b; Schulga et al., 1994). This has been termed as the incremental truncation for the creation of hybrid enzymes (ITCHY) (Ostermeier et al., 1999b). In this method, a library of 5′ fragments of random length from one gene is fused with a library of 3′ fragments of random length from another gene. The fragments are created by limited Exonuclease III digestion. Aliquots of an exonuclease digestion mixture are removed at short intervals to create a series of different length fragments. The fragments are joined in a plasmid vector, which can then be used to express the fusion protein. Hybrid enzymes may utilize established functions or properties from a wild-type enzyme and incorporate them into a novel enzyme (Nixon et al., 1989). ITCHY libraries are limited to one crossover point per hybrid.
In vivo recombination may also facilitate genetic modifications and mutations. Different vector systems and host strains have been described (WO 99/29902; Weber et al., 1983; Pompon et al., 1989).
The procedure referred to as “gene shuffling” or “sexual PCR” closely approximates the evolutionary process. In this method, parental genes are fragmented and reassembled by PCR™ to create full-length genes (U.S. Pat. Nos. 5,605,793; 5,811,238; 5,830,721). The shuffling procedure typically starts with double-stranded nucleic acid fragments such as PCR™ products from homologous genes. The genes are cleaved, for example with DNAse I, to produce random fragments. The fragments are purified and reassembled in PCR™ without primers. As the random fragments and their PCR™ products prime each other, the average size of the fragments increases with the number of PCR™ cycles. Recombination or crossover occurs by template switching, such as when a DNA fragment derived from one template primes on the homologous position of a related but different template. Products of the PCR™ undergo a second amplification reaction using primers from the original reaction. Full-length fragments are cloned into an expression vector for selection and screening. Reiterative rounds of this process are continued until the desired protein is found or no further improvements are achieved. Examples of enzyme improvements following gene shuffling have been reported (Crameri et al., 1996; Chang et al., 1999; Crameri et al., 1998). However, these methods generally involve optimization of nucleic acid fragmentation, size fractionation, or purification of gene fragments.
A modification of gene shuffling, the Staggered Extension Protocol (StEP) has been described (WO 98/42832; Shao et al., 1998; Zhao et al., 1997; Zhao et al., 1998). StEP involves priming template polynucleotides with random or flanking primers. Extended primers are reassembled in extremely fast cycles of PCR™, generating successively longer and longer extension products. In each cycle the primers/extension products can anneal to different templates based on sequence complementarity. The template switching between different sequences creates “recombination cassettes”. The process is continued until full-length genes are created. However, StEP requires careful monitoring of polymerase extension by precisely controlling time and temperature of the reaction.
A modification of the StEP technology has also been described (U.S. Pat. No. 5,965,408). Like StEP, random primers are annealed to a target(s) to be shuffled. The random primers are extended until stopped by “roadblocks” such as purine dimers. The premature termination is facilitated by blocking the polymerase with adducts associated with the template. Fragments are isolated and used in a separate PCR™ reaction to create longer overlapping fragments. However, the use of DNA adducts to create “roadblocks” may result in the halting of DNA synthesis at preferred locations of adduct binding. Therefore, halting of DNA synthesis may not randomly occur along the length of the nucleic acid and may not occur at every nucleotide in a sequence.
Despite these techniques for mutating nucleic acids and encoded polypeptides, there still exists a need for improved mutagenesis techniques. Methods that are easy, rapid, and result in thorough mutation of one or more sequences would be desirable.