A variety of approaches, including rational design and directed evolution, have been used to optimize protein functions (1, 2). The choice of approach for a given optimization problem depends, in part, on the degree of understanding of the relationships between sequence, structure and function. Rational redesign typically requires extensive knowledge of a structure-function relationship. Directed evolution requires little or no specific knowledge about structure-function relationship; rather, the essential features is a means to evaluate the function to be optimized. Directed evolution involves the generation of libraries of mutant molecules followed by selection or screening for the desired function. Gene products which show improvement with respect to the desired property or set of properties are identified by selection or screening. The gene(s) encoding those products can be subjected to further cycles of the process in order to accumulate beneficial mutations. This evolution can involve few or many generations, depending on how far one wishes to progress and the effects of mutations typically observed in each generation. Such approaches have been used to create novel functional nucleic acids (3, 4), peptides and other small molecules (3), antibodies (3), as well as enzymes and other proteins (5, 6, 7). These procedures are fairly tolerant to inaccuracies and noise in the function evaluation (7).
Several publications have discussed the role of gene recombination in directed evolution (see WO 97/07205, WO 98/42727, U.S. Pat. Nos. 5,807,723, 5,721,367, 5,776,744 and WO 98/41645 U.S. Pat. No. 5,811,238, WO 98/41622, WO 98/41623, and U.S. Pat. No. 5,093,257).
A PCR-based group of recombination methods consists of DNA shuffling [5,6], staggered extension process [89, 90] and random-priming recombination [87]. Such methods typically involve synthesis of significant amounts of DNA during assembly/recombination step and subsequent amplification of the final products and the efficiency of amplification decreases with gene size increase.
Yeast cells, which possess an active system for homologous recombination, have been used for in vivo recombination. Cells transformed with a vector and partially overlapping inserts efficiently join the inserts together in the regions of homology and restore a functional, covalently-closed plasmid [91]. This method does not require PCR amplification at any stage of recombination and therefore is free from the size considerations inherent in this method. However, the number of crossovers introduced in one recombination event is limited by the efficiency of transformation of one cell with multiple inserts. Other in vivo recombination methods entail recombination between two parental genes cloned on the same plasmid in a tandem orientation. One method relies on homologous recombination machinery of bacterial cells to produce chimeric genes [92]. A first gene in the tandem provides the N-terminal part of the target protein, and a second provides the C-terminal part. However, only one crossover can be generated by this approach. Another in vivo recombination method uses the same tandem organization of substrates in a vector [93]. Before transformation into E. coli cells, plasmids are linearized by endonuclease digestion between the parental sequences. Recombination is performed in vivo by the enzymes responsible for double-strand break repair. The ends of linear molecules are degraded by a 5′->3′ exonuclease activity, followed by annealing of complementary single-strand 3′ ends and restoration of the double-strand plasmid [94]. This method has similar advantages and disadvantages of tandem recombination on circular plasmid.