This invention relates to the field of preparing polynucleotides encoding a useful polypeptide by generating polynucleotides via a procedure involving blocking or interrupting a synthesis or amplification process with an adduct, agent, molecule or other inhibitor, assembling the polynucleotides to form at least one mutant polynucleotide and screening the mutant polynucleotides for the production of a mutant polypeptide(s) having a useful property.
An exceedingly large number of possibilities exist for purposeful and random combinations of amino acids within a protein to produce useful mutant proteins and their corresponding biological molecules encoding for the mutant proteins, i.e., DNA, RNA, etc. Accordingly, there is a need to produce and screen a wide variety of such mutant proteins for a useful utility, particularly widely varying random proteins.
The following general discussion of protein and polynucleotide fields may be helpful in further understanding the background for the present invention.
The complexity of an active sequence of a biological macromolecule, e.g., proteins, DNA etc., has been called its information content (xe2x80x9cICxe2x80x9d; 5-9), which has been defined as the resistance of the active protein to amino acid sequence variation (calculated from the minimum number of invariable amino acids (bits)) required to describe a family of related sequences with the same function. Proteins that are more sensitive to random mutagenesis have a high information content.
Molecular biology developments such as molecular libraries have allowed the identification of quite a large number of variable bases, and even provide ways to select functional sequences from random libraries. In such libraries, most residues can be varied (although typically not all at the same time) depending on compensating changes in the context. Thus, while a 100 amino acid protein can contain only 2,000 different mutations, 20100 combinations of mutations are possible.
Information density is the Information Content per unit length of a sequence. Active sites of enzymes tend to have a high information density. By contrast, flexible linkers of information in enzymes have a low information density.
Current methods in widespread use for creating mutant proteins in a library format are error-prone polymerase chain reactions and cassette mutagenesis, in which the specific region to be optimized is replaced with a synthetically mutagenized oligonucleotide. In both cases, a cloud of mutant sites is generated around certain sites in the original sequence.
Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a long sequence. In a mixture of fragments of unknown sequence, error-prone PCR can be used to mutagenize the mixture. The published error-prone PCR protocols suffer from a low processivity of the polymerase. Therefore, the protocol is unable to result in the random mutagenesis of an average-sized gene. This inability limits the practical application of error-prone PCR. Some computer simulations have suggested that point mutagenesis alone may often be too gradual to allow the large-scale block changes that are required for continued and dramatic sequence evolution. Further, the published error-prone PCR protocols do not allow for amplification of DNA fragments greater than 0.5 to 1.0 kb, limiting their practical application. In addition, repeated cycles of error-prone PCR can lead to an accumulation of neutral mutations with undesired resultsxe2x80x94such as affecting a protein""s immunogenicity but not its binding affinity.
In oligonucleotide-directed mutagenesis, a short sequence is replaced with a synthetically mutagenized oligonucleotide. This approach does not generate combinations of distant mutations and is thus not combinatorial. The limited library size relative to the vast sequence length means that many rounds of selection are unavoidable for protein optimization. Mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round followed by grouping them into families, arbitrarily choosing a single family, and reducing it to a consensus motif. Such motif is resynthesized and reinserted into a single gene followed by additional selection. This step process constitutes a statistical bottleneck, is labor intensive, and is not practical for many rounds of mutagenesis.
Error-prone PCR and oligonucleotide-directed mutagenesis are thus useful for single cycles of sequence fine tuning, but rapidly become too limiting when they are applied for multiple cycles.
Another serious limitation of error-prone PCR is that the rate of down-mutations grows with the information content of the sequence. As the information content, library size, and mutagenesis rate increase, the balance of down-mutations to up-mutations will statistically prevent the selection of further improvements (statistical ceiling).
In cassette mutagenesis, a sequence block of a single template is typically replaced by a (partially) randomized sequence. Therefore, the maximum information content that can be obtained is statistically limited by the number of random sequences (i.e., library size). This eliminates other sequence families which are not currently best, but which may have greater long term potential.
Also, mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round. Thus, such an approach is tedious and impractical for many rounds of mutagenesis.
Thus, error-prone PCR and cassette mutagenesis are best suited, and have been widely used, for fine-tuning areas of comparatively low information content. One apparent exception is the selection of an RNA ligase ribozyme from a random library using many rounds of amplification by error-prone PCR and selection.
It is becoming increasingly clear that the tools for the design of recombinant linear biological sequences such as protein, RNA and DNA are not as powerful as the tools nature has developed. Finding better and better mutants depends on searching more and more sequences within larger and larger libraries, and requiring increased numbers of cycles of mutagenic amplification and selection. However as discussed above, the existing mutagenesis methods that are in widespread use have distinct limitations when used for repeated cycles.
In nature the evolution of most organisms occurs by natural selection and sexual reproduction. Sexual reproduction ensures mixing and combining of the genes in the offspring of the selected individuals. During meiosis, homologous chromosomes from the parents line up with one another and cross-over part way along their length, thus randomly swapping genetic material. Such swapping or shuffling of the DNA allows organisms to evolve more rapidly.
In sexual recombination, because the inserted sequences were of proven utility in a homologous environment, the inserted sequences are likely to still have substantial information content once they are inserted into the new sequence.
Marton et al. (Nucleic Acids Res (1991) May 19:2423-6) describes the use of PCR in vitro to monitor recombination in a plasmid having directly repeated sequences. Marton et al. disclose that recombination will occur during PCR as a result of breaking or nicking of the DNA. This will give rise to recombinant molecules. Meyerhans et al. (Nucleic Acids Res (1990) Apr 18:1687-91) also disclose the existence of DNA recombination during in vitro PCR.
The term Applied Molecular Evolution (xe2x80x9cAMExe2x80x9d) means the application of an evolutionary design algorithm to a specific, useful goal. While many different library formats for AME have been reported for polynucleotides, peptides and proteins (phage, lad and polysomes), none of these formats have provided for recombination by random crossovers to deliberately create a combinatorial library.
Theoretically there are 2,000 different single mutants of a 100 amino acid protein. However, a protein of 100 amino acids has 20100 possible combinations of mutations, a number which is too large to exhaustively explore by conventional methods. It would be advantageous to develop a system which would allow generation and screening of all of these possible combination mutations.
Some workers in the art have utilized an in vivo site specific recombination system to combine light chain antibody genes with heavy chain antibody genes for expression in a phage system. However, their system relies on specific sites of recombination and is limited accordingly. Simultaneous mutagenesis of antibody CDR regions in single chain antibodies (scFv) by overlapping extension and PCR have been reported.
Others have described a method for generating a large population of multiple mutants using random in vivo recombination. However, their method requires the recombination of two different libraries of plasmids, each library having a different selectable marker. Thus, their method is limited to a finite number of recombinations equal to the number of selectable markers existing, and produces a concomitant linear increase in the number of marker genes linked to the selected sequence(s).
In vivo recombination between two homologous but truncated insect-toxin genes on a plasmid have been reported as also being capable of producing a hybrid gene. The in vivo recombination of substantially mismatched DNA sequences in a host cell having defective mismatch repair enzymes, resulting in hybrid molecule formation has been reported.
As discussed above, prior methods for producing random proteins from randomized genetic material have met with limited success. Perhaps the best method, thus far, for producing and screening a wide variety of random proteins is a method which utilizes enzymes to cleave (chop) a long nucleotide chain into shorter pieces followed by procedures to separate the chopping agents from the genetic material and procedures to amplify (multiply the copies of) the remaining genetic material in a manner that allows the annealing of the polynucleotides back into chains (either purposefully or randomly put them back together).
A drawback to this method is the expense and inconvenience of utilizing biological enzymes to chop up the genetic material, which are then separated from the genetic material prior to the amplification step. Further, depending upon the particular genetic material, different concentrations of the chopping agents are required to produce the desired fragments. Moreover, the control mechanisms required for biological enzymes are not trivial.
Accordingly, there is a need in the art for producing an improved method of obtaining truly random pieces of genetic material for reassembly to produce random proteins which may be screened for a particular use. The need to produce large libraries of widely varying mutant nucleic acid sequences is an important goal. Hence, it would be advantageous to develop such a method for the production of mutant proteins which allows for the development of large libraries of mutant nucleic acid sequences which are easily searched. There is a need to develop such a method which allows for the production of large libraries of mutant DNA, RNA or proteins and the selection of particular mutants for a desired goal.
The invention described herein is directed to the use of repeated cycles of mutagenesis, recombination and selection which allow for the directed molecular evolution of highly complex linear sequences, such as DNA, RNA or proteins thorough recombination. It uses repeated cycles of random points mutagenesis, nucleic acid shuffling and selection which allow for the directed molecular evolution in vitro of highly complex linear sequences, such as proteins through random recombination.
The present invention is directed to a method for generating a selected mutant polynucleotide sequence (or a population of selected polynucleotide sequences) typically in the form of amplified and/or cloned polynucleotides, whereby the selected polynucleotide sequences(s) possess at least one desired phenotypic characteristic (e.g., encodes a polypeptide, promotes transcription of linked polynucleotides, binds a protein, and the like) which can be selected for. One method for identifying mutant polypeptides that possess a desired structure or functional property, such as binding to a predetermined biological macromolecule (e.g., a receptor), involves the screening of a large library of polypeptides for individual library members which possess the desired structure or functional property conferred by the amino acid sequence of the polypeptide.
In one embodiment, the present invention provides a method for generating libraries of displayed polypeptides or displayed antibodies suitable for affinity interaction screening or phenotypic screening. The method comprises (1) obtaining a first plurality of selected library members comprising a displayed polypeptide or displayed antibody and an associated polynucleotide encoding said displayed polypeptide or displayed antibody, and obtaining said associated polynucleotides or copies thereof wherein said associated polynucleotides comprise a region of substantially identical sequences, optimally introducing mutations into said polynucleotides or copies, (2) pooling the polynucleotides or copies, (3) producing smaller or shorter polynucleotides by interrupting a random or particularized priming and synthesis process or an amplification process, and (4) performing amplification, preferably PCR amplification, and optionally mutagenesis to homologously recombine the newly synthesized polynucleotides.
It is a particularly preferred object of the invention to provide a process for producing mutant polynucleotides which express a useful mutant polypeptide by a series of steps comprising:
(a) producing polynucleotides by interrupting a polynucleotide amplification or synthesis process with a means for blocking or interrupting the amplification or synthesis process and thus providing a plurality of smaller or shorter polynucleotides due to the replication of the polynucleotide being in various stages of completion;
(b) adding to the resultant population of single- or double-stranded polynucleotides one or more single- or double-stranded oligonucleotides, wherein said added oligonucleotides comprise an area of identity in an area of heterology to one or more of the single- or double-stranded polynucleotides of the population;
(c) denaturing the resulting single- or double-stranded oligonucleotides to produce a mixture of single-stranded polynucleotides, optionally separating the shorter or smaller polynucleotides into pools of polynucleotides having various lengths and further optionally subjecting said polynucleotides to a PCR procedure to amplify one or more oligonucleotides comprised by at least one of said polynucleotide pools;
(d) incubating a plurality of said polynucleotides or at least one pool of said polynucleotides with a polymerase under conditions which result in annealing of said single-stranded polynucleotides at regions of identity between the single-stranded polynucleotides and thus forming of a mutagenized double-stranded polynucleotide chain;
(e) optionally repeating steps (c) and (d);
(f) expressing at least one mutant polypeptide from said polynucleotide chain, or chains; and
(g) screening said at least one mutant polypeptide for a useful activity.
In a preferred aspect of the invention, the means for blocking or interrupting the amplification or synthesis process is by utilization of uv light, DNA adducts, DNA binding proteins. Preferably, the DNA adduct is a member selected from the group consisting of: UV light; (+)-CC-1065; (+)-CC-1065-(N3-Adenine); a N-acetylated or deacetylated 4xe2x80x2-fluoro-4-aminobiphenyl adduct capable of inhibiting DNA synthesis, or a N-acetylated or deacetylated 4-aminobiphenyl adduct capable of inhibiting DNA synthesis; trivalent chromium; a trivalent chromium salt, a polycyclic aromatic hydrocarbon (xe2x80x9cPAHxe2x80x9d) DNA adduct capable of inhibiting DNA replication, such as 7-bromomethyl-benz[xcex1]anthracene (xe2x80x9cBMAxe2x80x9d); tris(2,3-dibromopropyl)phosphate (xe2x80x9cTris-BPxe2x80x9d), 1,2-dibromo-3-chloropropane (xe2x80x9cDBCPxe2x80x9d); 2-bromoacrolein (2BA); benzo[xcex1]pyrene-7,8-dihydrodiol-9-10-epoxide (xe2x80x9cBPDExe2x80x9d); a platinum(II) halogen salt; N-hydroxy-2-amino-3-methylimidazo[4,5-f]-quinoline (xe2x80x9cN-hydroxy-IQxe2x80x9d); and N-hydroxy-2-amino-1-methyl-6-phenylimidazo[4,5-f]-pyridine (xe2x80x9cN-hydroxy-PhIPxe2x80x9d).
Especially preferred members from the grouping consist of UV light, (+)-CC1065 and (+)-CC-1065-(N3-Adenine).
In one embodiment of the invention, the DNA adducts, or polynucleotides comprising the DNA adducts, are removed from the polynucleotides or polynucleotide pool, such as by a process including heating the solution comprising the DNA fragments prior to further processing.
The present invention relates to an enhanced method of DNA xe2x80x9cshuffling,xe2x80x9d which may be referred to as xe2x80x9cSexual PCR.xe2x80x9d In a preferred embodiment of the present invention, amplified or cloned polynucleotides possessing a desired characteristic (for example, encoding a polypeptide of interest, etc.) are selected (via screening of a library of polynucleotides, for example) and pooled. The pooled polynucleotides (or at least one polynucleotide) may be subjected to random at least one of random primer extension reactions, or PCR amplification using random primers to multiply portions of the polynucleotide or polynucleotides. At various stages along the completion of the PCR amplification or synthesis process, the process may be blocked or interrupted. Hence, a collection of incomplete copies of the polynucleotide or polynucleotides can be generated by random primer extension reactions, amplification using random primers, and/or by pausing or stopping the replication process.
These collections of shorter or smaller polynucleotides (pools) may be isolated or collectively amplified further by PCR, which may be interrupted again. Such xe2x80x9cstackingxe2x80x9d of the amplification and pausing or stopping steps has the advantage of producing a truly randomized sample of polynucleotides having widely varying lengths. For example, some of the smaller polynucleotides may hybridize with the longer polynucleotides and act as additional random primers to initiate self-priming amplification of polynucleotides within the pool.
Such a process provides an efficient means for producing widely-varying random polynucleotides and subsequent widely-varying mutant proteins corresponding to the same random selection as in the random polynucleotide pool. The reassembly of the shorter or smaller polynucleotides after such shuffling to produce the random polynucleotides may be provided by utilizing procedures standard in the art.
In one embodiment of the invention, the adduct or adducts which halt or slow the PCR process have been modified with a chemical group for which there exists (or can be obtained) a monoclonal antibody specific for the same. Such is an example permitting an efficient separation of polynucleotide chains comprising the DNA adducts (or for the removal of the adducts which have been released from the DNA polynucleotides which comprise them) from other polynucleotide chains. In some situations, it may be desirable to remove such DNA adducts before further processing of the amplified polynucleotides. In other situations it may be desirable to leave such DNA adducts in the solution with the intention of producing a further randomized pool of polynucleotides. Whether the DNA adduct is to be removed or left within the polynucleotide pool depends upon the composition of the adduct itself and the immediate goal of that amplification process step.
In a preferred embodiment, the polynucleotides produced by interrupting the PCR amplification (and optionally subsequent amplification of the said polynucleotides to produce further randomization under conditions suitable for PCR amplifications) are recombined to form a shuffled pool of recombined polynucleotides, whereby a substantial fraction (e.g., greater than 10 percent) of the recombined polynucleotides of said shuffled pool were not present in the first plurality of selected library members, said shuffled pool providing a library of displayed polypeptides or displayed antibodies suitable for affinity interaction screening.
Optionally, the method comprises the additional step of screening the library members of the shuffled pool to identify individual shuffled library members having the ability to bind or otherwise interact (e.g., such as catalytic antibodies) with a predetermined macromolecule, such as for example a proteinaceous receptor, peptide oligosaccharide, viron, or other predetermined compound or structure.
The displayed polypeptides, antibodies, peptidomimetic antibodies, and variable region sequences that are identified from such libraries can be used for therapeutic, diagnostic, research and related purposes (e.g., catalysts, solutes for increasing osmolarity of an aqueous solution, and the like), and/or can be subjected to one or more additional cycles of shuffling and/or affinity selection. The method can be modified such that the step of selecting for a phenotypic characteristic can be other than of binding affinity for a predetermined molecule (e.g., for catalytic activity, stability oxidation resistance, drug resistance, or detectable phenotype conferred upon a host cell).
In one embodiment, the first plurality of selected library members is polynucleotides is produced and homologously recombined by PCR in vitro, the resultant polynucleotides are transferred into a host cell or organism via a transferring means and homologously recombined to form shuffled library members in vivo.
In one embodiment, the first plurality of selected library members is cloned or amplified on episomally replicable vectors, a multiplicity of said vectors is transferred into a cell and homologously recombined to form shuffled library members in vivo.
In one embodiment, the first plurality of selected library members is not produced as shorter or smaller polynucleotides, but is cloned or amplified on a episomally replicable vector as a direct repeat, with each repeat comprising a distinct species of selected library member sequence, said vector is transferred into a cell and homologously recombined by intra-vector recombination to form shuffled library members in vivo.
In an embodiment, combinations of in vitro and in vivo shuffling are provided to enhance combinatorial diversity.
The present invention provides a method for generating libraries of displayed antibodies suitable for affinity interactions screening. The method comprises (1) obtaining first a plurality of selected library members comprising a displayed antibody and an associated polynucleotide encoding said displayed antibody, and obtaining said associated polynucleotide encoding for said displayed antibody and obtaining said associated polynucleotides or copies thereof, wherein said associated polynucleotides comprise a region of substantially identical variable region framework sequence, and (2) pooling and producing shorter or smaller polynucleotides with said associated polynucleotides or copies to form polynucleotides under conditions suitable for PCR amplification by slowing or halting the PCR amplification and thereby homologously recombining said shorter or smaller polynucleotides to form a shuffled pool of recombined polynucleotides of said shuffled pool. CDR combinations comprised by the shuffled pool are not present in the first plurality of selected library members, said shuffled pool composing a library of displayed antibodies comprising CDR permutations and suitable for affinity interaction screening. Optionally, the shuffled pool is subjected to affinity screening to select shuffled library members which bind to a predetermined epitope (antigen) and thereby selecting a plurality of selected shuffled library members. Further, the plurality of selectedly shuffled library members can be shuffled and screened iteratively, from 1 to about 1000 cycles or as desired until library members having a desired binding affinity are obtained.
According one aspect of the present invention provides a method for introducing one or more mutations into a template double-stranded polynucleotide, wherein the template double-stranded polynucleotide has produced polynucleotides of a desired size by the above slowed or halted PCR process, by adding to the resultant population of double stranded polynucleotides one or more single or double stranded oligonucleotides, wherein said oligonucleotides comprise an area of identity and an area of heterology to the template polynucleotide; denaturing the resultant mixture of double-stranded random polynucleotides and oligonucleotides into single-stranded polynucleotides; incubating the resultant population of single-stranded polynucleotides with a polymerase under conditions which result in the annealing of said single-stranded polynucleotides and formation of a mutagenized double-stranded polynucleotide; and repeating the above steps as desired.
In another aspect the present invention is directed to a method of producing recombinant proteins having biological activity by treating a sample comprising double-stranded template polynucleotides encoding a wild-type protein under sexual PCR conditions according to the present invention which provide for the production of polynucleotides which include random double-stranded polynucleotides having a desired size and adding to the resultant population of random polynucleotides one or more single or double-stranded oligonucleotides, wherein said oligonucleotides comprise areas of identity and areas of heterology to the template polynucleotide; denaturing the resulting mixture of double-stranded polynucleotides and oligonucleotides into single-stranded polynucleotides; incubating the resultant population of single-stranded polynucleotides with a polymerase under conditions which cause annealing of said single-stranded polynucleotides at the areas of identity to occur and thus to form at least one mutagenized double-stranded polynucleotide; repeating the above steps as desired; and then expressing the recombinant protein from the mutagenized double-stranded polynucleotide.
A third aspect of the present invention is directed to a method for obtaining chimeric polynucleotide by treating a sample comprising different double-stranded template polynucleotides wherein said different template polynucleotides contain areas of identity and areas of heterology under sexual PCR conditions which provide random double-stranded polynucleotides of a desired size from the template polynucleotide; denaturing the resulting random double-stranded polynucleotides to provide single-stranded polynucleotides; incubating the resulting single-stranded polynucleotides with a polymerase under conditions which provide for the annealing of the single-stranded polynucleotides at the areas of identity and the formation of a chimeric double-stranded polynucleotide sequence comprising template polynucleotide sequences; and repeating the above steps as desired.
A fourth aspect of the present invention is directed to a method of replicating a template polynucleotide by combining in vitro single-stranded template polynucleotides with small random single-stranded polynucleotides resulting from the sexual PCR process according to the present invention and denaturation of the template polynucleotide, and incubating said mixture of nucleic acid polynucleotides in the presence of a nucleic acid polymerase under conditions wherein a population of double-stranded template polynucleotides is formed.
The invention also provides the use of polynucleotides shuffling, in vitro and/or in vivo to shuffle polynucleotides encoding polypeptides and/or polynucleotides comprising transcriptional regulatory sequences.
The invention also provides the use of polynucleotide shuffling to shuffle a population of viral genes (e.g., capsid proteins, spike glycoproteins, polymerases, proteases, etc.) or viral genomes (e.g., paramyxoviridae, orthomyxoviridae, herpesviruses, retroviruses, reoviruses, rhinoviruses, etc.). In an embodiment, the invention provides a method for shuffling sequences encoding all or portions of immunogenic viral proteins to generate novel combinations of epitopes as well as novel epitopes created by recombination; such shuffled viral proteins may comprise epitopes or combinations of epitopes as well as novel epitopes created by recombination; such shuffled viral proteins may comprise epitopes or combinations of epitopes which are likely to arise in the natural environment as a consequence of viral evolution; (e.g., such as recombination of influenza virus strains).
The invention also provides a method suitable for shuffling polynucleotide sequences for generating gene therapy vectors and replication-defective gene therapy constructs, such as may be used for human gene therapy, including but not limited to vaccination vectors for DNA-based vaccination, as well as anti-neoplastic gene therapy and other general therapy formats.