This invention relates to the field of protein engineering. More specifically, this relates to a directed evolution method for preparing a polynucleotides encoding polypeptide, which method comprises the step of generating site-directed mutagenesis optionally in combination with the step of polynucleotide chimerization, the step of selecting for potentially desirable progeny molecules, including by a process termed end-selection (which may then be screened further), and the step of screening the polynucleotides for the production of polypeptide(s) having a useful property.
In a particular aspect, the present invention is relevant to enzymes, particularly to thermostable enzymes, and to their generation by directed evolution. More particularly, the present invention relates to thermostable enzymes which are stable at high temperature and which have improved activity at lower temperatures.
Harvesting the full potential of nature""s diversity can include both the step of discovery and the step of optimizing what is discovered. For example, the step of discovery allows one to mine biological molecules that have industrial utility. However, for certain industrial needs, it is advantageous to further modify these enzymes experimentally to achieve properties beyond what natural evolution has provided and is likely to provide in the near future.
The process, termed directed evolution, of experimentally modifying a biological molecule towards a desirable property, can be achieved by mutagenizing one or more parental molecular templates and identifying any desirable molecules among the progeny molecules. However, currently available technologies used in directed evolution have several shortfalls. Among these shortfalls are:
1) Site-directed mutagenesis technologies, such as sloppy or low-fidelity PCR, are ineffective for systematically achieving at each position (site) along a polypeptide sequence the full (saturated) range of possible mutations (i.e. all possible amino acid substitutions).
2) There is no relatively easy systematic means for rapidly analyzing the large amount of information that can be contained in a molecular sequence and in the potentially colossal number or progeny molecules that could be conceivably obtained by the directed evolution of one or more molecular templates.
3) There is no relatively easy systematic means for providing comprehensive empirical information relating structure to function for molecular positions.
4) There is no easy systematic means for incorporating internal controls in certain mutagenesis (e.g. chimerization) procedures.
5) There is no easy systematic means to select for specific progeny molecules, such as full-length chimeras, from among smaller partial sequences.
Molecular mutagenesis occurs in nature and has resulted in the generation of a wealth of biological compounds that have shown utility in certain industrial applications. However, evolution in nature often selects for molecular properties that are discordant with many unmet industrial needs. Additionally, it is often the case that when industrially useful mutations would otherwise be favored at the molecular level, natural evolution often overrides the positive selection of such mutations when there is a concurrent detriment to an organism as a whole (such as when a favorable mutation is accompanied by a detrimental mutation). Additionally still, natural evolution is slow, and places high emphasis on fidelity in replication. Finally, natural evolution prefers a path paved mainly by beneficial mutations while tending to avoid a plurality of successive negative mutations, even though such negative mutations may prove beneficial when combined, or may leadxe2x80x94through a circuitous routexe2x80x94to a final state that is beneficial.
Directed evolution, on the other hand, can be performed much more rapidly and aimed directly at evolving a molecular property that is industrially desirable where nature does not provide one.
An exceedingly large number of possibilities exist for purposeful and random combinations of amino acids within a protein to produce useful hybrid proteins and their corresponding biological molecules encoding for these hybrid proteins, i.e., DNA, RNA. Accordingly, there is a need to produce and screen a wide variety of such hybrid proteins for a desirable utility, particularly widely varying random proteins.
The complexity of an active sequence of a biological macromolecule (e.g., polynucleotides, polypeptides, and molecules that are comprised of both polynucleotide and polypeptide sequences) has been called its information content (xe2x80x9cICxe2x80x9d), which has been defined as the resistance of the active protein to amino acid sequence variation (calculated from the minimum number of invariable amino acids (bits) required to describe a family of related sequences with the same function). Proteins that are more sensitive to random mutagenesis have a high information content.
Molecular biology developments, such as molecular libraries, have allowed the identification of quite a large number of variable bases, and even provide ways to select functional sequences from random libraries. In such libraries, most residues can be varied (although typically not all at the same time) depending on compensating changes in the context. Thus, while a, 100 amino acid protein can contain only 2,000 different mutations, 20100 sequence combinations are possible.
Information density is the IC per unit length of a sequence. Active sites of enzymes tend to have a high information density. By contrast, flexible linkers of information in enzymes have a low information density.
Current methods in widespread use for creating alternative proteins in a library format are error-prone polymerase chain reactions and cassette mutagenesis, in which the specific region to be optimized is replaced with a synthetically mutagenized oligonucleotide. In both cases, a substantial number of mutant sites are generated around certain sites in the original sequence.
Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a long sequence. In a mixture of fragments of unknown sequence, error-prone PCR can be used to mutagenize the mixture. The published error-prone PCR protocols suffer from a low processivity of the polymerase. Therefore, the protocol is unable to result in the random mutagenesis of an average-sized gene. This inability limits the practical application of error-prone PCR. Some computer simulations have suggested that point mutagenesis alone may often be too gradual to allow the large-scale block changes that are required for continued and dramatic sequence evolution. Further, the published error-prone PCR protocols do not allow for amplification of DNA fragments greater than 0.5 to 1.0 kb, limiting their practical application. In addition, repeated cycles of error-prone PCR can lead to an accumulation of neutral mutations with undesired results, such as affecting a protein""s immunogenicity but not its binding affinity.
In oligonucleotide-directed mutagenesis, a short sequence is replaced with a synthetically mutagenized oligonucleotide. This approach does not generate combinations of distant mutations and is thus not combinatorial. The limited library size relative to the vast sequence length means that many rounds of selection are unavoidable for protein optimization. Mutagenesis, with synthetic oligonucleotides requires sequencing of individual clones after each selection round followed by grouping them into families, arbitrarily choosing a single family, and reducing it to a consensus motif. Such motif is resynthesized and reinserted into a single gene followed by additional selection. This step process constitutes a statistical bottleneck, is labor intensive, and is not practical for many rounds of mutagenesis.
Error-prone PCR and oligonucleotide-directed mutagenesis are thus useful for single cycles of sequence fine tuning, but rapidly become too limiting when they are applied for multiple cycles.
Another limitation of error-prone PCR is that the rate of down-mutations grows with the information content of the sequence. As the information content, library size, and mutagenesis rate increase, the balance of down-mutations to up-mutations will statistically prevent the selection of further improvements (statistical ceiling).
In cassette mutagenesis, a sequence block of a single template is typically replaced by a (partially) randomized sequence. Therefore, the maximum information content that can be obtained is statistically limited by the number of random sequences (i.e., library size). This eliminates other sequence families which are not currently best, but which may have greater long term potential.
Also, mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round. Thus, such an approach is tedious and impractical for many rounds of mutagenesis.
Thus, error-prone PCR and cassette mutagenesis are best suited, and have been widely used, for fine-tuning areas of comparatively low information content. One apparent exception is the selection of an RNA ligase ribozyme from a random library using many rounds of amplification by error-prone PCR and selection.
In nature, the evolution of most organisms occurs by natural selection and sexual reproduction. Sexual reproduction ensures mixing and combining of the genes in the offspring of the selected individuals. During meiosis, homologous chromosomes from the parents line up with one another and cross-over part way along their length, thus randomly swapping genetic material. Such swapping or shuffling of the DNA allows organisms to evolve more rapidly.
In recombination, because the inserted sequences were of proven utility in a homologous environment, the inserted sequences are likely to still have substantial information content once they are inserted into the new sequence.
The term Applied Molecular Evolution (xe2x80x9cAMExe2x80x9d) means the application of an evolutionary design algorithm to a specific, useful goal. While many different library formats for AME have been reported for polynucleotides, peptides and proteins (phage, lacI and polysomes), none of these formats have provided for recombination by random cross-overs to deliberately create a combinatorial library.
Theoretically there are 2,000 different single mutants of a 100 amino acid protein. However, a protein of 100 amino acids has 20100 possible sequence combinations, a number which is too large to exhaustively explore by conventional methods. It would be advantageous to develop a system which would allow generation and screening of all of these possible combination mutations.
Some workers in the art have utilized an in vivo site specific recombination system to generate hybrids of combine light chain antibody genes with heavy chain antibody genes for expression in a phage system. However, their system relies on specific sites of recombination and is limited accordingly. Simultaneous mutagenesis of antibody CDR regions in single chain antibodies (scFv) by overlapping extension and PCR have been reported.
Others have described a method for generating a large population of multiple hybrids using random in vivo recombination. This method requires the recombination of two different libraries of plasmids, each library having a different selectable marker. The method is limited to a finite number of recombinations equal to the number of selectable markers existing, and produces a concomitant linear increase in the number of marker genes linked to the selected sequence(s).
In vivo recombination between two homologous, but truncated, insect-toxin genes on a plasmid has been reported as a method of producing a hybrid gene. The in vivo recombination of substantially mismatched DNA sequences in a host cell having defective mismatch repair enzymes, resulting in hybrid molecule formation has been reported.
This invention relates generally to the field of nucleic acid engineering and correspondingly encoded recombinant protein engineering. More particularly, the invention relates to the directed evolution of nucleic acids and screening of clones containing the evolved nucleic acids for resultant activity(ies) of interest, such nucleic acid activity(ies) and/or specified protein, particularly enzyme, activity(ies) of interest.
This invention relates generally to a method of: 1) preparing a progeny generation molecule (including a molecule that is comprised of a polynucleotide sequence, a molecule that is comprised of a polypeptide sequence, and a molecules that is comprised in part of a polynucleotide sequence and in part of a polypeptide sequence), that is mutagenized to achieve at least one point mutation, addition, deletion, and/or chimerization, from one or more ancestral or parental generation template(s); 2) screening the progeny generation moleculexe2x80x94preferably using a high throughput methodxe2x80x94for at least one property of interest (such as an improvement in an enzyme activity or an increase in stability or a novel chemotherapeutic effect); 3) optionally obtaining and/or cataloguing structural and/or and functional information regarding the parental and/or progeny generation molecules; and 4) optionally repeating any of steps 1) to 3).
In a preferred embodiment, there is generated (e.g. from a parent polynucleotide template)xe2x80x94in what is termed xe2x80x9ccodon site-saturation mutagenesisxe2x80x9dxe2x80x94a progeny generation of polynucleotides, each having at least one set of up to three contiguous point mutations (i.e. different bases comprising a new codon), such that every codon (or every family of degenerate codons encoding the same amino acid) is represented at each codon position. Corresponding toxe2x80x94and encoded byxe2x80x94this progeny generation of polynucleotides, there is also generated a set of progeny polypeptides, each having at least one single amino acid point mutation. In a preferred aspect, there is generatedxe2x80x94in what is termed xe2x80x9camino acid site-saturation mutagenesisxe2x80x9dxe2x80x94one such mutant polypeptide for each of the 19 naturally encoded polypeptide-forming alpha-amino acid substitutions at each and every amino acid position along the polypeptide. This yieldsxe2x80x94for each and every amino acid position along the parental polypeptidexe2x80x94a total of 20 distinct progeny polypeptides including the original amino acid, or potentially more than 21 distinct progeny polypeptide s if additional amino acids are used either instead of or in addition to the 20 naturally encoded amino acids
Thus, in another aspect, this approach is also serviceable for generating mutants containingxe2x80x94in addition to and/or in combination with the 20 naturally encoded polypeptide-forming alpha-amino acidsxe2x80x94other rare and/or not naturally-encoded amino acids and amino acid derivatives. In yet another aspect, this approach is also serviceable for generating mutants by the use ofxe2x80x94in addition to and/or in combination with natural or unaltered codon recognition systems of suitable hostsxe2x80x94altered, mutagenized, and/or designer codon recognition systems (such as in a host cell with one or more altered tRNA molecules).
In yet another aspect, this invention relates to recombination and more specifically to a method for preparing polynucleotides encoding a polypeptide by a method of in vivo re-assortment of polynucleotide sequences containing regions of partial homology, assembling the polynucleotides to form at least one polynucleotide and screening the polynucleotides for the production of polypeptide(s) having a useful property.
In yet another preferred embodiment, this invention is serviceable for analyzing and cataloguingxe2x80x94with respect to any molecular property (e.g. an enzymatic activity) or combination of properties allowed by current technologyxe2x80x94the effects of any mutational change achieved (including particularly saturation mutagenesis). Thus, a comprehensive method is provided for determining the effect of changing each amino acid in a parental polypeptide into each of at least 19 possible substitutions. This allows each amino acid in a parental polypeptide to be characterized and catalogued according to its spectrum of potential effects on a measurable property of the polypeptide.
In another aspect, the method of the present invention utilizes the natural property of cells to recombine molecules and/or to mediate reductive processes that reduce the complexity of sequences and extent of repeated or consecutive sequences possessing regions of homology.
It is an object of the present invention to provide a method for generating hybrid polynucleotides encoding biologically active hybrid polypeptides with enhanced activities. In accomplishing these and other objects, there has been provided, in accordance with one aspect of the invention, a method for introducing polynucleotides into a suitable host cell and growing the host cell under conditions that produce a hybrid polynucleotide.
In another aspect of the invention, the invention provides a method for screening for biologically active hybrid polypeptides encoded by hybrid polynucleotides. The present method allows for the identification of biologically active hybrid polypeptides with enhanced biological activities.
Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
In a specific embodiment, this invention provides method for producing and isolating a library of progeny polynucleotides having at least one desirable property comprised of the steps of:
(a) subjecting a starting or parental polynucleotide set to a mutagenesis process so as to produce a progeny polynucleotide set; and
(b) subjecting the progeny polynucleotide set to an end selection-based screening and enrichment process, so as to select for a desirable subset of the progeny polynucleotide set;
whereby the above steps can be performed iteratively and in any order and in combination,
whereby the end selection-based process creates ligation-compatible ends,
whereby the creation of ligation-compatible ends is optionally used to facilitate one or more intermolecular ligations, that are preferably directional ligations, within members of the progeny polynucleotide set so as to achieve assembly and/or reassembly mutagenesis,
whereby the creation of ligation-compatible ends serves to facilitate ligation of the progeny polynucleotide set into an expression vector system and expression cloning,
whereby the end selection-based screening and enrichment process allows one to produce a library of progeny polynucleotides generated by a mutagenesis process, include non-stochastic polynucleotide site-saturation mutagenesis (Gene Site Saturation Mutagenesis(trademark)) and non-stochastic polynucleotide reassembly (GeneReassembly(trademark)),
whereby the expression cloning of the progeny polynucleotide set serves to generate a full-length polypeptide set,
whereby the generated polypeptide set can be subjected to an expression screening process, and
whereby expression screening of the progeny polypeptide set provides a means to identify a desirable species, e.g. a mutant polypeptide or alternatively a polypeptide fragment, that has a desirable property, such as a specific enzymatic activity.
In another specific embodiment, this invention provides a method for producing and isolating a polypeptide having at least one desirable property comprised of the steps of:
(a) subjecting a starting or parental polynucleotide set to a mutagenesis process so as to produce a progeny polynucleotide set; and
(b) subjecting the progeny polynucleotide set to an end selection-based screening and enrichment process, so as to select for a desirable subset of the progeny polynucleotide set;
whereby the above steps can be performed iteratively and in any order and in combination,
whereby the end selection-based process creates ligation-compatible ends,
whereby the creation of ligation-compatible ends is optionally used to facilitate one or more intermolecular ligations, that are preferably directional ligations, within members of the progeny polynucleotide set so as to achieve assembly and/or reassembly mutagenesis,
whereby the end selection-based screening and enrichment process allows one to produce a library of progeny polynucleotides generated by a mutagenesis process, include non-stochastic polynucleotide site-saturation mutagenesis (Gene Site Saturation Mutagenesis(trademark)) and non-stochastic polynucleotide reassembly (GeneReassembly(trademark)),
whereby the expression cloning of the progeny polynucleotide set serves to generate a full-length polypeptide set,
whereby the creation of ligation-compatible ends serves to facilitate ligation of the progeny polynucleotide set into an expression vector system and expression cloning,
whereby the generated polypeptide set can be subjected to an expression screening process, and
whereby expression screening of the progeny polypeptide set provides a means to identify a desirable species, e.g. a mutant polypeptide or alternatively a polypeptide fragment, that has a desirable property, such as a specific enzymatic activity.
In a specific aspect of this embodiment, this invention provides the immediately preceding methods, wherein the mutagenesis process of step (a) is comprised of a process, termed saturation mutagenesis, for generating, from a codon-containing parental polypeptide template, a progeny polypeptide set in which a full range of single amino acid substitutions is represented at each amino acid position, comprising the steps of:
(a) subjecting a working codon-containing template polynucleotide to polymerase-based amplification using a degenerate oligonucleotide for each codon to mutagenized, where each of said degenerate oligonucleotides is comprised of a first homologous sequence and a degenerate triplet sequence, so as to generate a set of progeny polynucleotides;
wherein said degenerate triplet sequence is selected from the group consisting of i) N,N,N; ii) N,N,G/T; iii) N,N,G/C; iv) N,N,C/G/T; v) N,N,A/G/T; vi) N,N,A/C/T; vii) N,N,A/C/G; and viii) any degenerate codon that encodes all 20 amino acids; and
(b) subjecting said set of progeny polynucleotides to recombinant expression such that polypeptides encoded by the progeny polynucleotides are produced;
whereby the above steps can be performed iteratively and in any order and in combination, and
whereby, said method provides a means for generating all 20 amino acid changes at each amino acid site along a parental polypeptide template, because the degeneracy of the triplet sequence includes codons for all 20 amino acids.
In a specific aspect of this embodiment, this invention further provides the immediately preceding methods, wherein the mutagenesis process of step (a) is comprised of a process, termed synthetic ligation gene reassembly or simply synthetic ligation gene reassembly.