Protein mutagenesis has long been used as a tool for structure/function studies of proteins. With the advent of modern DNA manipulation techniques and advancements in protein structure determination, large numbers of protein sequences and structures are available that can be sorted into groups or superfamilies based on structural similarity. Such groupings demonstrate that proteins that are structurally similar often catalyze similar reactions and have active sites with shared amino acid residues. Further, these groupings facilitate identification of side chain residues that are important in binding and catalysis, and allow for their modification so as to yield proteins with altered properties.
Such structure-based rational approaches to protein engineering, through introduction of point mutations, exchange of secondary structural elements, and exchange of whole domains or subunits, have given rise to enzymes that have altered substrate specificities, catalytic properties and oligomeric states. Although few protein-engineering failures have been published, the difficulty in rationally engineering an enzyme to have a specific function is widely appreciated. Any alteration introduced into a wildtype protein can disrupt the fine balance that nature has achieved, often in unpredictable ways, and consequently give rise to proteins that are unstable, fail to fold properly and lack catalytic activity. As a result of the difficulties encountered using strict rational design approaches, there is an increasing trend towards the use of molecular biology strategies that mimic evolutionary processes. These strategies are known as “directed evolution.”
Most directed evolution strategies incorporate some method of introducing random mutations into a gene followed by screening or selection for a desired property. The cycle is then repeated several times until the desired property is achieved or until further cycling produces no improvement in the desired property. Early methodologies utilized point mutations generated by error-prone PCR, chemical mutagenesis or mutator strains of E. coli. This type of approach is something akin to an asexual evolutionary process with non-beneficial and beneficial mutations becoming fixed. Such strategies have been particularly successful in achieving improvements in thermostability, altering substrate specificity, and improving activity in organic solvents. However, because directed evolution is a stepwise process, only relatively small steps in sequence space can occur. Thus, the utility of current directed evolution methodologies to evolve novel catalytic sites, which presumably require large excursions in sequence space, is limited.
The advent of methods for recombination, which more closely approximates the natural evolutionary process, has had an enormous impact on directed evolution. In various methods for recombination, such as DNA shuffling, parental genes are fragmented and subsequently reassembled by PCR to reconstitute the full-length genes. During this reassembly process, novel combinations of the parental genes arise along with new point mutations. This recombination or shuffling approach generates a large library of mutant genes wherein genes that exhibit a desired function can be obtained by using an appropriate selection or screening system.
Although it is true that shuffling of families of genes with DNA homology can create hybrid proteins with new properties, such molecular breeding is only feasible for genes with sufficient genetic homology and, for this reason, is unlikely to evolve entirely novel function. It is important to realize that the primary rationale for success in the shuffling of families of genes is the similarity of the three-dimensional structures of the proteins they encode, not the degree of DNA homology. Successful directed evolution on homologous families might be equally or better served by the creation of genes with crossovers between family members at regions of little or no genetic homology. However, current DNA shuffling methodologies only produce crossovers within regions of sufficient homology and within significant stretches of identity. Furthermore, crossovers are biased towards those regions of highest identity.
The increasing numbers of protein structures available and the study of enzyme structural families have shown that many proteins with little or no DNA homology can have high protein structural homology. Constructing hybrids of such structural homologues may well be an important strategy for engineering novel activities; however, no combinatorial approach for the construction of such hybrids has been reported.
Work by some of the inventors focused on the inter-conversion of formyltetrahydrofolate-utilizing enzymes. Active hybrids were created by engineering a functional hybrid enzyme through fusing domains from two enzymes, expressed on separate vectors, that overall had very little genetic homology. Discrete domain fusions were made between the glycinamide ribonucleotide (GAR) binding domain of the E. coli purN gene (GAR transformylase) and the formyl-tetrahydrofolate binding and catalytic domain of the E. coli purU gene (formyltetrahydro-folate hydrolase). Although a hybrid enzyme was created that had the desired property (GAR transformylase activity), this activity was low. Ostermeier, Nixon, Shim, and Benkovic, Proc. Natl. Acad. Sci., USA, 96: 3562-3567 (1999), incorporated herein by reference in its entirety.
There is therefore a need for a method of making hybrid genes without regard to sequence homology. There is a demand for simple, straightforward generation of single-base truncations of nucleic acids. There is also demand for a controllable method for creating hybrid genes that span most, if not all possible truncated portions. There is also a great demand for using such hybrid gene formation to develop new methods of creating novel hybrid proteins with modified characteristics or functionalities.
The present invention provides such methods. The present invention permits the creation of nucleic acid hybrids without regard for sequence homology. The present invention also provides a straightforward, controllable method of creating individual and pluralities of hybrid truncated nucleic acids, and concomitant individual and pluralities of hybrid polypeptides, in which the hybrids cover most, if not substantially all, possible combinations of bases.
Still further benefits and advantages will be apparent to the skilled worker from the disclosures that follow.