The present invention relates to a method for the production of polynucleotides conferring a desired phenotype and/or encoding a protein having an advantageous predetermined property which is selectable or can be screened for. In an aspect, the method is used for generating and selecting or screening for desired nucleic acid fragments encoding mutant proteins.
The complexity of an active sequence of a biological macromolecule, e.g. proteins, DNA etc., has been called its information content (xe2x80x9cICxe2x80x9d; 5-9). The information content of a protein has been defined as the resistance of the active protein to amino acid sequence variation, calculated from the minimum number of invariable amino acids (bits) required to describe a family of related sequences with the same function (9, 10). Proteins that are sensitive to random mutagenesis have a high information content. In 1974, when this definition was coined, protein diversity existed only as taxonomic diversity.
Molecular biology developments such as molecular libraries have allowed the identification of a much larger number of variable bases, and even to select functional sequences from random libraries. Most residues can be varied, although typically not all at the same time, depending on compensating changes in the context. Thus a 100 amino acid protein can contain only 2,000 different mutations, but 20100 possible combinations of mutations.
Information density is the Information Content/unit length of a sequence. Active sites of enzymes tend to have a high information density. By contrast, flexible linkers in enzymes have a low information density (8).
Current methods in widespread use for creating mutant proteins in a library format are error-prone polymerase chain reaction (11, 12, 19) and cassette mutagenesis (8, 20, 21, 22, 40, 41, 42), in which the specific region to be optimized is replaced with a synthetically mutagenized oligonucleotide. Alternatively, mutator strains of host cells have been employed, to add mutational frequency (Greener and Callahan (1995) Strategies in Mol. Biol. 7: 32). In each case, a xe2x80x98mutant cloudxe2x80x99 (4) is generated around certain sites in the original sequence.
Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a long sequence. Error prone PCR can be used to mutagenize a mixture of fragments of unknown sequence. However, computer simulations have suggested that point mutagenesis alone may often be too gradual to allow the block changes that are required for continued sequence evolution. The published error-prone PCR protocols are generally unsuited for reliable amplification of DNA fragments greater than 0.5 to 1.0 kb, limiting their practical application. Further, repeated cycles of error-prone PCR lead to an accumulation of neutral mutations, which, for example, may make a protein immunogenic.
In oligonucleotide-directed mutagenesis, a short sequence is replaced with a synthetically mutagenized oligonucleotide. This approach does not generate combinations of distant mutations and is thus not significantly combinatorial. The limited library size relative to the vast sequence length means that many rounds of selection are unavoidable for protein optimization. Mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round followed by grouping into families, arbitrarily choosing a single family, and reducing it to a consensus motif, which is resynthesized and reinserted into a single gene followed by additional selection. This process constitutes a statistical bottleneck, it is labor intensive and not practical for many rounds of mutagenesis.
Error-prone PCR and oligonucleotide-directed mutagenesis are thus useful for single cycles of sequence fine tuning but rapidly become limiting when applied for multiple cycles.
Error-prone PCR can be used to mutagenize a mixture of fragments of unknown sequence (11, 12). However, the published error-prone PCR protocols (11, 12) suffer from a low processivity of the polymerase. Therefore, the protocol is very difficult to employ for the random mutagenesis of an average-sized gene. This inability limits the practical application of error-prone PCR.
Another serious limitation of error-prone PCR is that the rate of down-mutations grows with the information content of the sequence. At a certain information content, library size, and mutagenesis rate, the balance of down-mutations to up-mutations will statistically prevent the selection of further improvements (statistical ceiling).
Finally, repeated cycles of error-prone PCR will also lead to the accumulation of neutral mutations, which can affect, for example, immunogenicity but not binding affinity.
Thus error-prone PCR was found to be too gradual to allow the block changes that are required for continued sequence evolution (1, 2).
In cassette mutagenesis, a sequence block of a single template is typically replaced by a (partially) randomized sequence. Therefore, the maximum information content that can be obtained is statistically limited by the number of random sequences (i.e., library size). This constitutes a statistical bottleneck, eliminating other sequence families which are not currently best, but which may have greater long term potential.
Further, mutagenesis with synthetic oligonucleotides requires sequencing of individual clones after each selection round (20). Therefore, this approach is tedious and is not practical for many rounds of mutagenesis.
Error-prone PCR and cassette mutagenesis are thus best suited and have been widely used for fine-tuning areas of comparatively low information content. An example is the selection of an RNA ligase ribozyme from a random library using many rounds of amplification by error-prone PCR and selection (13).
It is becoming increasingly clear our scientific tools for the design of recombinant linear biological sequences such as protein, RNA and DNA are not suitable for generating the necessary sequence diversity needed to optimize many desired properties of a macromolecule or organism. Finding better and better mutants depends on searching more and more sequences within larger and larger libraries, and increasing numbers of cycles of mutagenic amplification and selection are necessary. However as discussed above, the existing mutagenesis methods that are in widespread use have distinct limitations when used for repeated cycles.
Evolution of most organisms occurs by natural selection and sexual reproduction. Sexual reproduction ensures mixing and combining of the genes of the offspring of the selected if individuals. During meiosis, homologous chromosomes from the parents line up with one another and cross-over part way along their length, thus swapping genetic material. Such swapping or shuffling of the DNA allows organisms to evolve more rapidly (1, 2). In sexual recombination, because the inserted sequences were of proven utility in a homologous environment, the inserted sequences are likely to still have substantial information content once they are inserted into the new sequence.
Marton et al., (27) describes the use of PCR in vitro to monitor recombination in a plasmid having directly repeated sequences. Marton et al. discloses that recombination will occur during PCR as a result of breaking or nicking of the DNA. This will give rise to recombinant molecules. Meyerhans et al. (23) also disclose the existence of DNA recombination during in vitro PCR.
The term Applied Molecular Evolution (xe2x80x9cAMExe2x80x9d) means the application of an evolutionary design algorithm to a specific, useful goal. While many different library formats for AME have been reported for polynucleotides (3, 11-14), peptides and proteins (phage (15-17), lacI (18) and polysomes, in none of these formats has recombination by random cross-overs been used to deliberately create a combinatorial library.
Theoretically there are 2,000 different single mutants of a 100 amino acid protein. A protein of 100 amino acids has 20100 possible combinations of mutations, a number which is too large to exhaustively explore by conventional methods. It would be advantageous to develop a system which would allow the generation and screening of all of these possible combination mutations.
Winter and coworkers (43, 44) have utilized an in vivo site specific recombination system to combine light chain antibody genes with heavy chain antibody genes for expression in a phage system. However, their system relies on specific sites of recombination and thus is limited. Hayashi et al. (48) report simultaneous mutagenesis of antibody CDR regions in single chain antibodies (scFv) by overlap extension and PCR.
Caren et al. (45) describe a method for generating a large population of multiple mutants using random in vivo recombination. However, their method requires the recombination of two different libraries of plasmids, each library having a different selectable marker. Thus the method is limited to a finite number of recombinations equal to the number of selectable markers existing, and produces a concomitant linear increase in the number of marker genes linked to the selected sequence(s). Caren et al. does not describe the use of multiple selection cycles; recombination is used solely to construct larger libraries.
Calogero et al. (46) and Galizzi et al. (47) report that in vivo recombination between two homologous but truncated insect-toxin genes on a plasmid can produce a hybrid gene. Radman et al. (49) report in vivo recombination of substantially mismatched DNA sequences in a host cell having defective mismatch repair enzymes, resulting in hybrid molecule formation.
It would be advantageous to develop a method for the production of mutant proteins which method allowed for the development of large libraries of mutant nucleic acid sequences which were easily searched. The invention described herein is directed to the use of repeated cycles of point mutagenesis, nucleic acid shuffling and selection which allow for the directed molecular evolution in vitro of highly complex linear sequences, such as proteins through random recombination.
Accordingly, it would be advantageous to develop a method which allows for the production of large libraries of mutant DNA, RNA or proteins and the selection of particular mutants for a desired goal. The invention described herein is directed to the use of repeated cycles of mutagenesis, in vivo recombination and selection which allow for the directed molecular evolution in vivo and in vitro of highly complex linear sequences, such as DNA, RNA or proteins through recombination.
Further advantages of the present invention will become apparent from the following description of the invention with reference to the attached drawings.
The present invention is directed to a method for generating a selected polynucleotide sequence or population of selected polynucleotide sequences, typically in the form of amplified and/or cloned polynucleotides, whereby the selected polynucleotide sequence(s) possess a desired phenotypic characteristic (e.g., encode a polypeptide, promote transcription of linked polynucleotides, bind a protein, and the like) which can be selected for. One method of identifying polypeptides that possess a desired structure or functional property, such as binding to a predetermined biological macromolecule (e.g., a receptor), involves the screening of a large library of polypeptides for individual library members which possess the desired structure or functional property conferred by the amino acid sequence of the polypeptide.
In a general aspect, the invention provides a method, termed xe2x80x9csequence shufflingxe2x80x9d, for generating libraries of recombinant polynucleotides having a desired characteristic which can be selected or screened for. Libraries of recombinant polynucleotides are generated from a population of related-sequence polynucleotides which comprise sequence regions which have substantial sequence identity and can be homologously recombined in vitro or in vivo. In the method, at least two species of the related-sequence polynucleotides are combined in a recombination system suitable for generating sequence-recombined polynucleotides, wherein said sequence-recombined polynucleotides comprise a portion of at least one first species of a related-sequence polynucleotide with at least one adjacent portion of at least one second species of a related-sequence polynucleotide. Recombination systems suitable for generating sequence-recombined polynucleotides can be either: (1) in vitro systems for homologous recombination or sequence shuffling via amplification or other formats described herein, or (2) in vivo systems for homologous recombination or site-specific recombination as described herein. The population of sequence-recombined polynucleotides comprises a subpopulation of polynucleotides which possess desired or advantageous characteristics and which can be selected by a suitable selection or screening method. The selected sequence-recombined polynucleotides, which are typically related-sequence polynucleotides, can then be subjected to at least one recursive cycle wherein at least one selected sequence-recombined polynucleotide is combined with at least one distinct species of related-sequence polynucleotide (which may itself be a selected sequence-recombined polynucleotide) in a recombination system suitable for generating sequence-recombined polynucleotides, such that additional generations of sequence-recombined polynucleotide sequences are generated from the selected sequence-recombined polynucleotides obtained by the selection or screening method employed. In this manner, recursive sequence recombination generates library members which are sequence-recombined polynucleotides possessing desired characteristics. Such characteristics can be any property or attribute capable of being selected for or detected in a screening system, and may include properties of: an encoded protein, a transcriptional element, a sequence controlling transcription, RNA processing, RNA stability, chromatin conformation, translation, or other expression property of a gene or transgene, a replicative element, a protein-binding element, or the like, such as any feature which confers a selectable or detectable property.
The present invention provides a method for generating libraries of displayed polypeptides or displayed antibodies suitable for affinity interaction screening or phenotypic screening. The method comprises (1) obtaining a first plurality of selected library members comprising a displayed polypeptide or displayed antibody and an associated polynucleotide encoding said displayed polypeptide or displayed antibody, and obtaining said associated polynucleotides or copies thereof wherein said associated polynucleotides comprise a region of substantially identical sequence, optionally introducing mutations into said polynucleotides or copies, and (2) pooling and fragmenting, by nuclease digestion, partial extension PCR amplification, PCR stuttering, or other suitable fragmenting means, typically producing random fragments or fragment equivalents, said associated polynucleotides or copies to form fragments thereof under conditions suitable for PCR amplification, performing PCR amplification and optionally mutagenesis, and thereby homologously recombining said fragments to form a shuffled pool of recombined polynucleotides, whereby a substantial fraction (e.g., greater than 10 percent) of the recombined polynucleotides of said shuffled pool are not present in the first plurality of selected library members, said shuffled pool composing a library of displayed polypeptides or displayed antibodies suitable for affinity interaction screening. Optionally, the method comprises the additional step of screening the library members of the shuffled pool to identify individual shuffled library members having the ability to bind or otherwise interact (e.g., such as catalytic antibodies) with a predetermined macromolecule, such as for example a proteinaceous receptor, peptide, oligosaccharide, virion, or other predetermined compound or structure. The displayed polypeptides, antibodies, peptidomimetic antibodies, and variable region sequences that are identified from such libraries can be used for therapeutic, diagnostic, research, and related purposes (e.g., catalysts, solutes for increasing osmolarity of an aqueous solution, and the like), and/or can be subjected to one or more additional cycles of shuffling and/or affinity selection. The method can be modified such that the step of selecting is for a phenotypic characteristic other than binding affinity for a predetermined molecule (e.g., for catalytic activity, stability, oxidation resistance, drug resistance, or detectable phenotype conferred on a host cell).
In one embodiment, the first plurality of selected library members is fragmented and homologously recombined by PCR in vitro. Fragment generation is by nuclease digestion, partial extension PCR amplification, PCR stuttering, or other suitable fragmenting means, such as described herein. Stuttering is fragmentation by incomplete polymerase extension of templates. A recombination format based on very short PCR extension times was employed to create partial PCR products, which continue to extend off a different template in the next (and subsequent) cycle(s).
In one embodiment, the first plurality of selected library members is fragmented in vitro, the resultant fragments transferred into a host cell or organism and homologously recombined to form shuffled library members in vivo.
In one embodiment, the first plurality of selected library members is cloned or amplified on episomally replicable vectors, a multiplicity of said vectors is transferred into a cell and homologously recombined to form shuffled library members in vivo.
In one embodiment, the first plurality of selected library members is not fragmented, but is cloned or amplified on an episomally replicable vector as a direct repeat or indirect (or inverted) repeat, which each repeat comprising a distinct species of selected library member sequence, said vector is transferred into a cell and homologously recombined by intra-vector or inter-vector recombination to form shuffled library members in vivo.
In an embodiment, combinations of in vitro and in vivo shuffling are provided to enhance combinatorial diversity.
The present invention provides a method for generating libraries of displayed antibodies suitable for affinity interaction screening. The method comprises (1) obtaining a first plurality of selected library members comprising a displayed antibody and an associated polynucleotide encoding said displayed antibody, and obtaining said associated polynucleotides or copies thereof, wherein said associated polynucleotides comprise a region of substantially identical variable region framework sequence, and (2) pooling and fragmenting said associated polynucleotides or copies to form fragments thereof under conditions suitable for PCR amplification and thereby homologously recombining said fragments to form a shuffled pool of recombined polynucleotides comprising novel combinations of CDRs, whereby a substantial fraction (e.g., greater than 10 percent) of the recombined polynucleotides of said shuffled pool comprise CDR combinations which are not present in the first plurality of selected library members, said shuffled pool composing a library of displayed antibodies comprising CDR permutations and suitable for affinity interaction screening. Optionally, the shuffled pool is subjected to affinity screening to select shuffled library members which bind to a predetermined epitope (antigen) and thereby selecting a plurality of selected shuffled library members. Optionally, the plurality of selected shuffled library members can be shuffled and screened iteratively, from 1 to about 1000 cycles or as desired until library members having a desired binding affinity are obtained.
Accordingly, one aspect of the present invention provides a method for introducing one or more mutations into a template double-stranded polynucleotide, wherein the template double-stranded polynucleotide has been cleaved or PCR amplified (via partial extension or stuttering) into random fragments of a desired size, by adding to the resultant population of double-stranded fragments one or more single or double-stranded oligonucleotides, wherein said oligonucleotides comprise an area of identity and an area of heterology to the template polynucleotide; denaturing the resultant mixture of double-stranded random fragments and oligonucleotides into single-stranded fragments; incubating the resultant population of single-stranded fragments with a polymerase under conditions which result in the annealing of said single-stranded fragments at regions of identity between the single-stranded fragments and formation of a mutagenized double-stranded polynucleotide; and repeating the above steps as desired.
In another aspect the present invention is directed to a method of producing recombinant proteins having biological activity by treating a sample comprising double-stranded template polynucleotides encoding a wild-type protein under conditions which provide for the cleavage of said template polynucleotides into random double-stranded fragments having a desired size; adding to the resultant population of random fragments one or more single or double-stranded oligonucleotides, wherein said oligonucleotides comprise areas of identity and areas of heterology to the template polynucleotide; denaturing the resultant mixture of double-stranded fragments and oligonucleotides into single-stranded fragments; incubating the resultant population of single-stranded fragments with a polymerase under conditions which result in the annealing of said single-stranded fragments at the areas of identity and formation of a mutagenized double-stranded polynucleotide; repeating the above steps as desired; and then expressing the recombinant protein from the mutagenized double-stranded polynucleotide.
A third aspect of the present invention is directed to a method for obtaining a chimeric polynucleotide by treating a sample comprising different double-stranded template polynucleotides wherein said different template polynucleotides contain areas of identity and areas of heterology under conditions which provide for the cleavage of said template polynucleotides into random double-stranded fragments of a desired size; denaturing the resultant random double-stranded fragments contained in the treated sample into single-stranded fragments; incubating the resultant single-stranded fragments with polymerase under conditions which provide for the annealing of the single-stranded fragments at the areas of identity and the formation of a chimeric double-stranded polynucleotide sequence comprising template polynucleotide sequences; and repeating the above steps as desired.
A fourth aspect of the present invention is directed to a method of replicating a template polynucleotide by combining in vitro single-stranded template polynucleotides with small random single-stranded fragments resulting from the cleavage and denaturation of the template polynucleotide, and incubating said mixture of nucleic acid fragments in the presence of a nucleic acid polymerase under conditions wherein a population of double-stranded template polynucleotides is formed.
The invention also provides the use of polynucleotide shuffling, in vitro and/or in vivo to shuffle polynucleotides encoding polypeptides and/or polynucleotides comprising transcriptional regulatory sequences.
The invention also provides the use of polynucleotide shuffling to shuffle a population of viral genes (e.g., capsid proteins, spike glycoproteins, polymerases, proteases, etc.) or viral genomes (e.g., paramyxoviridae, orthomyxoviridae, herpesviruses, retroviruses, reoviruses, rhinoviruses, etc.). In an embodiment, the invention provides a method for shuffling sequences encoding all or portions of immunogenic viral proteins to generate novel combinations of epitopes as well as novel epitopes created by recombination; such shuffled viral proteins may comprise epitopes or combinations of epitopes which are likely to arise in the natural environment as a consequence of viral evolution (e.g., such as recombination of influenza virus strains).
The invention also provides the use of polynucleotide shuffling to shuffle a population of protein variants, such as taxonomically-related, structurally-related, and/or functionally-related enzymes and/or mutated variants thereof to create and identify advantageous novel polypeptides, such as enzymes having altered properties of catalysis, temperature profile, stability, oxidation resistance, or other desired feature which can be selected for. Methods suitable for molecular evolution and directed molecular evolution are provided. Methods to focus selection pressure(s) upon specific portions of polynucleotides (such as a segment of a coding region) are provided.
The invention also provides a method suitable for shuffling polynucleotide sequences for generating gene therapy vectors and replication-defective gene therapy constructs, such as may be used for human gene therapy, including but not limited to vaccination vectors for DNA-based vaccination, as well as anti-neoplastic gene therapy and other gene therapy formats.
The invention provides a method for generating an enhanced green fluorescent protein (GFP) and polynucleotides encoding same, comprising performing DNA shuffling on a GFP encoding expression vector and selecting or screening for variants having an enhanced desired property, such as enhanced fluorescence. In a variation, an embodiment comprises a step of error-prone or mutagenic amplification, propagation in a mutator strain (e.g., a host cell having a hypermutational phenotype; mutL, etc.; yeast strains such as those described in Klein (1995) Progr. Nucl. Acid Res. Mol. Biol. 51: 217, incorporated herein by reference), chemical mutagenesis, or site-directed mutagenesis. In an embodiment, the enhanced GFP protein comprises a point mutation outside the chromophore region (amino acids 64-69), preferably in the region from amino acid 100 to amino acid 173, with specific preferred embodiments at residue 100, 154, and 164; typically, the mutation is a substitution mutation, such as F100S, M154T or V164A. In an embodiment, the mutation substitutes a hydrophilic residue for a hydrophobic residue. In an embodiment, multiple mutations are present in the enhanced GFP protein and its encoding polynucleotide. The invention also provides the use of such an enhanced GFP protein, such as for a diagnostic reporter for assays and high throughput screening assays and the like.
The invention also provides for improved embodiments for performing in vitro sequence shuffling. In one aspect, the improved shuffling method includes the addition of at least one additive which enhances the rate or extent of reannealing or recombination of related-sequence polynucleotides. In an embodiment, the additive is polyethylene glycol (PEG), typically added to a shuffling reaction to a final concentration of 0.1 to 25 percent, often to a final concentration of 2.5 to 15 percent, to a final concentration of about 10 percent. In an embodiment, the additive is dextran sulfate, typically added to a shuffling reaction to a final concentration of 0.1 to 25 percent, often at about 10 percent. In an embodiment, the additive is an agent which reduces sequence specificity of reannealing and promotes promiscuous hybridization and/or recombination in vitro. In an alternative embodiment, the additive is an agent which increases sequence specificity of reannealing and promotes high fidelity hybridization and/or recombination in vitro. Other long-chain polymers which do not interfere with the reaction may also be used (e.g., polyvinylpyrrolidone, etc.).
In one aspect, the improved shuffling method includes the addition of at least one additive which is a cationic detergent. Examples of suitable cationic detergents include but are not limited to: cetyltrimethylammonium bromide (CTAB), dodecyltrimethylammonium bromide (DTAB), and tetramethylammonium chloride (TMAC), and the like.
In one aspect, the improved shuffling method includes the addition of at least one additive which is a recombinogenic protein that catalyzes or non-catalytically enhances homologous pairing and/or strand exchange in vitro. Examples of suitable recombinogenic proteins include but are not limited to: E. coli recA protein, the T4 uvsX protein, the rec1 protein from Ustilago maydis, other recA family recombinases from other species, single strand binding protein (SSB), ribonucleoprotein A1, and the like. Shuffling can be used to improve one or more properties of a recombinogenic protein; for example, mutant sequences encoding recA can be shuffled and improved heat-stable variants selected by recursive sequence recombination.
Non-specific (general recombination) recombinases such as Topoisomerase I, Topoisomerase II (Tse et al. (1980) J. Biol. Chem. 255: 5560; Trask et al. (1984) EMBO J. 3: 671, incorporated herein by reference) and the like can be used to catalyze in vitro recombination reactions to shuffle a plurality of related sequence polynucelotide species by the recursive methods of the invention.
In one aspect, the improved shuffling method includes the addition of at least one additive which is an enzyme having an exonuclease activity which is active at removing non-templated nucleotides introduced at 3xe2x80x2 ends of product polynucleotides in shuffling amplification reactions catalyzed by a non-proofreading polymerase. An example of a suitable enzyme having an exonuclease activity includes but is not limited to Pfu polymerase. Other suitable polymerases include, but are not limited to: Thermus flavus DNA polymerase (Tfl) Thermus thermophilus DNA polymerase (Tth) Thermococcus litoralis DNA polymerase (Tli, Vent) Pyrococcus Woesei DNA polymerase (Pwo) Thermotoga maritima DNA polymerase (UltMa) Thermus brockianus DNA polymerase (Thermozyme) Pyrococcus furiosus DNA polymerase (Pfu) Thermococcus sp. DNA polymerase (9xc2x7Nm) Pyrococcus sp. DNA polymerase (xe2x80x98Deep Ventxe2x80x99) Bacteriophage T4 DNA polymerase Bacteriophage T7 DNA polymerase E. coli DNA polymerase I (native and Klenow) E. coli DNA polymerase III.
In an aspect, the improved shuffling method comprises the modification wherein at least one cycle of amplification (i.e., extension with a polymerase) of reannealed fragmented library member polynucleotides is conducted under conditions which produce a substantial fraction, typically at least 20 percent or more, of incompletely extended amplification products. The amplification products, including the incompletely extended amplification products are denatured and subjected to at least one additional cycle of reannealing and amplification. This variation, wherein at least one cycle of reannealing and amplification provides a substantial fraction of incompletely extended products, is termed xe2x80x9cstutteringxe2x80x9d and in the subsequent amplification round the incompletely extended products reanneal to and prime extension on different sequence-related template species.
In an aspect, the improved shuffling method comprises the modification wherein at least one cycle of amplification is conducted using a collection of overlapping single-stranded DNA fragments of varying lengths corresponding to a first polynucleotide species or set of related-sequence polynucleotide species, wherein each overlapping fragment can each hybridize to and prime polynucleotide chain extension from a second polynucleotide species serving as a template, thus forming sequence-recombined polynucleotides, wherein said sequence-recombined polynucleotides comprise a portion of at least one first polynucleotide species with an adjacent portion of the second polynucleotide species which serves as a template. In a variation, the second polynucleotide species serving as a template contains uracil (i.e., a Kunkel-type template) and is substantially non-replicable in cells. This aspect of the invention can also comprise at least two recursive cycles of this variation.
In an embodiment, PCR can be conducted wherein the nucleotide mix comprises a nucleotide species having uracil as the base. The PCR product(s) can then be fragmented by digestion with UDG-glycosylase which produces strand breaks. The fragment size can be controlled by the fraction of uracil-containing NTP in the PCR mix.
In an aspect, the improved shuffling method comprises the modification wherein at least one cycle of amplification is conducted with an additive or polymerase in suitable conditions which promote template switching. In an embodiment where Taq polymerase is employed for amplification, addition of recA or other polymerases (e.g., viral polymerases, reverse transcriptase) enhances template switching. Template-switching can also be increased by increasing the DNA template concentration, among other means known by those skilled in the art.
In an embodiment of the general method, libraries of sequence-recombined polynucleotides are generated from sequence-related polynucleotides which are naturally-occurring genes or alleles of a gene. In this aspect, at least two naturally-occurring genes and/or alleles which comprise regions of at least 50 consecutive nucleotides which have at least 70 percent sequence identity, preferably at least 90 percent sequence identity, are selected from a pool of gene sequences, such as by hybrid selection or via computerized sequence analysis using sequence data from a database. In an aspect, at least three naturally-occurring genes and/or alleles which comprise regions of at least 50 consecutive nucleotides which have at least 70 percent sequence identity, preferably at least 90 percent sequence identity, are selected from a pool of gene sequences, such as by hybrid selection or via computerized sequence analysis using sequence data from a database. The selected sequences are obtained as polynucleotides, either by cloning or via DNA synthesis, and shuffled by any of the various embodiments of the invention.
In an embodiment of the invention, multi-pool shuffling is performed. Shuffling of multiple pools of polynucleotide sequences allows each separate pool to generate a different combinatorial solution to produce the desired property. In this variation, the pool of parental polynucleotides sequences (or any subsequent shuffled library or selected pool of library members) is subdivided (or segregated) into two or more discrete pools of sequences and are separately subjected to one or more rounds of recursive sequence recombination and selection (or screening). If desired, optionally, selected library members from each separate pool may be recombined (integrated) in latter rounds of shuffling. Alternatively, multiple separate parental pools may be used. Inbreeding, wherein selected (or screened) library members within a pool are crossed with each other by the recursive sequence recombination methods of the invention, can be performed, alone or in combination with outbreeding, wherein library members of different pools are crossed with each other by the recursive sequence recombination methods of the invention.
In an embodiment of the invention, the method comprises the further step of removing non-shuffled products (e.g., parental sequences) from sequence-recombined polynucleotides produced by any of the disclosed shuffling methods. Non-shuffled products can be removed or avoided by performing amplification with: (1) a first PCR primer which hybridizes to a first parental polynucleotide species but does not substantially hybridize to a second parental polynucleotide species, and (2) a second PCR primer which hybridizes to a second parental polynucleotide species but does not substantially hybridize to the first parental polynucleotide species, such that amplification occurs from templates comprising the portion of the first parental sequence which hybridizes to the first PCR primer and also comprising the portion of the second parental sequence which hybridizes to the second PCR primer, thus only sequence-recombined polynucleotides are amplified.
The invention also provides for alternative embodiments for performing in vivo sequence shuffling. In one aspect, the alternative shuffling method includes the use of inter-plasmidic recombination, wherein libraries of sequence-recombined polynucleotide sequences are obtained by genetic recombination in vivo of compatible or non-compatible multicopy plasmids inside suitable host cells. When non-compatible plasmids are used, typically each plasmid type has a distinct selectable marker and selction for retention of each desired plasmid type is applied. The related-sequence polynucleotide sequences to be recombined are separately incorporated into separately replicable multicopy vectors, typically bacterial plasmids each having a distinct and separately selectable marker gene (e.g., a drug resistance gene). Suitable host cells are transformed with both species of plasmid and cells expressing both selectable marker genes are selected and sequence-recombined sequences are recovered and can be subjected to additional rounds of shuffling by any of the means described herein.
In one aspect, the alternative shuffling method includes the use of intra-plasmidic recombination, wherein libraries of sequence-recombined polynucleotide sequences are obtained by genetic recombination in vivo of direct or inverted sequence repeats located on the same plasmid. In a variation, the sequences to be recombined are flanked by site-specific recombination sequences and the polynucleotides are present in a site-specific recombination system, such as an integron (Hall and Collins (1995) Mol. Microbiol. 15: 593, incorporated herein by reference) and can include insertion sequences, transposons (e.g., IS1), and the like. Introns have a low rate of natural mobility and can be used as mobile genetic elements both in prokaryotes and eukaryotes. Shuffling can be used to improve the performance of mobile genetic elements. These high frequency recombination vehicles can be used for the rapid optimization of large sequences via transfer of large sequence blocks. Recombination between repeated, interspersed, and diverged DNA sequences, also called xe2x80x9chomeologousxe2x80x9d sequences, is typically suppressed in normal cells. However, in MutL and MutS cells, this suppression is relieved and the rate of intrachromosomal recombination is increased (Petit et al. (1996) Genetics 129: 327, incorporated herein by reference).
In an aspect of the invention, mutator strains of host cells are used to enhance recombination of more highly mismatched sequence-related polynucleotides. Bacterials strains such as MutL, MutS, MutT, or MutH or other cells expressing the Mut proteins (XL-1red; StratAgene, San Diego, Calif.) can be used as host cells for shuffling of sequence-related polynucleotides by in vivo recombination. Other mutation-prone host cell types can also be used, such as those having a proofreading-defective polymerase (Foster et al. (1995) Proc. Natl. Acad. Sci. (U.S.A.) 92: 7951, incorporated herein by reference). Mutator strains of yeast can be used, as can hypermutational mammalian cells, including ataxia telangiectasia cells, such as described in Luo et al. (1996) J. Biol. Chem. 271: 4497, incorporated herein by reference.
Other in vivo and in vitro mutagenic formats can be employed, including administering chemical or radiological mutagens to host cells. Examples of such mutagens include but are not limited to: MNU, ENU, MNNG, nitrosourea, BuDR, and the like. Ultraviolet light can also be used to generate mutations and/or to enhance the rate of recombination, such as by irradiation of cells used to enhance in vivo recombination. Ionizing radiation and clastogenic agents can also be used to enhance mutational frequency and/or to enhance recombination and/or to effect polynucleotide fragmentation.