Methods of directed evolution employing DNA or polynucleotide repair to mediate the creation of genetic diversity are known in the art. For example, such a method is described in WO 99/29902, which is expressly incorporated herein in its entirety. These types of methods consist of exposing a heteroduplex polynucleotide to a cellular DNA repair system to either fully convert one strand of the heteroduplex to the perfect complement of the opposing strand or to partially convert one or both strands to the more perfect complement of the other, thereby forming in the latter case a recombinant heteroduplex.
In many ways, however, the particular use of DNA repair described above does not lend itself to shuffling-based methods of directed evolution, especially those methods that are template-mediated. First, in template-mediated methods of shuffling the recombinant strand that forms on the template strand should not, of course, be repaired to the point where it becomes identical to the template strand. Second, depending on the reaction conditions and starting materials the template strand might be inadvertently repaired to match the recombinant strand, which is especially disadvantageous if the template is to be recycled or used again. Thus, using DNA repair as described above would require close monitoring of the starting materials, conditions of the repair reactions and the amounts of repair enzymes. Third, using DNA repair as described above places the experimenters means of control at a point toward the end of the process, that is, not until after the shuffling and the annealing of the recombinant strand to the template. As explained below, there are advantages to moving the means of control to a point before the shuffling or creation of the fragments.
Moreover, the use of DNA repair described in WO 99/29902 does not suggest an alternative use for DNA repair, perhaps because the alternative use only makes sense in the context of shuffling-based methods of directed evolution. The alternative is use of DNA repair proteins not to repair polynucleotides per se but to fragment polynucleotides.
Cellular DNA is constantly exposed to a wide spectrum of exogenous factors (e.g., ultraviolet light, ionizing radiation, or environmental chemicals) or endogenous factors (e.g., oxidative damage, or structural instability at physiological pH) that generate DNA lesions. To counteract these factors, a variety of DNA repair pathways have evolved to protect the cell against the genotoxic and lethal effects of DNA damage. DNA excision repair pathways include, for example, base excision repair (BER), nucleotide excision repair (NER), and mismatch repair (MMR).
In E. coli, for example, several proteins are involved in these three repair pathways. In the BER pathway, the involved proteins are DNA glycosylase, AP endonuclease, DNA polymerase I and DNA ligase. The position of a damaged base is called the abasic site or AP site. The DNA glycosylase recognizes the AP site and removes the damaged base. Then, the AP endonuclease removes the AP site and neighboring nucleotides thereby creating an induced gap. Finally, the induced gap is filled by the DNA polymerase I and the DNA ligase. See FIG. 1.
In the NER pathway, the involved proteins in E. coli are Uvr-A, Uvr-B, Uvr-C, DNA polymerase I and DNA Ligase. Urn-A, Uvr-B, and Uvr-C are involved in removing of damaged nucleotides (e.g., dimers induced by ultraviolet light) to create an induced gap. The induced gap is filled by DNA polymerase I and DNA ligase. See FIG. 2. In yeast, proteins similar to those in E. coli are involved. For example, in yeast, RADxx (e.g., RAD3 and RAD10) proteins are similar to Uvr in E. coli. 
In the MMR pathway, the involved proteins in E. coli are DAM methylase, MutS, MutL, MutH, exonuclease, DNA helicase II, SSB protein, DNA polymerase III and DNA ligase. To repair mismatched bases, the pathway involves a determination of which base is the correct one. In E. coli, this determination is achieved by a special methylase called Dam methylase, which can methylate all adenines that occur within (5′)GATC sequences. Immediately after DNA replication, the template strand has been methylated, but the newly synthesized strand has not yet been methylated. Thus, the template strand and the new strand can be distinguished. Mismatch repair in eukaryotes may be similar to that in E. coli. Homologs of MutS and MutL have been identified in yeast, mammals, and other eukaryotes. MSH1 to MSH5 are homologous to MutS. MLH1, PMSI and PMS2 are homologous to MutL. In eukaryotes, the mechanism to distinguish the template strand from the new strand is still unclear.
The MMR pathway continues with the MutS binding to mismatched base pairs. See FIG. 3. MutL is then recruited to the complex and activates MutH which binds to GATC sequences. Activation of MutH cleaves the unmethylated strand at a GATC site. Subsequently, the segment from the cleavage site to the mismatch is removed by exonuclease. This step simultaneously involves helicase II and single strand DNA binding proteins. These single strand DNA binding proteins are called SSB proteins. If the cleavage occurs on the 3′ side of the mismatch, the step is carried out by exonuclease I (which degrades a single strand only in the 3′ to 5′ direction). If the cleavage occurs on the 5′ side of the mismatch, exonuclease VII or RecJ is used to degrade the single stranded DNA. The gap is filled by DNA polymerase III and DNA ligase. The distance between the GATC site and the mismatch can be as long as 1000 base pairs. Therefore, mismatch repair is very expensive and inefficient.
In vitro and in vivo recombination of nucleic acids have useful applications (e.g., creating novel sequences which encode proteins having desired or improved properties). A variety of methods have been described in the art to enable this recombination (e.g., rational design to directed evolution). These methods include those described in U.S. Pat. Nos. 5,605,793 and 5,965,408—which are herein incorporated by reference in their entirety. Generally, recombination methods depend on making fragments and recombining the fragments. With regard to recombining fragments, various methods have been described in the art. For example, U.S. Pat. Nos. 5,605,793 and 5,965,408 recite recombining fragments based upon polymerase chain reaction-like themocycling of fragments in the presence of DNA polymerase. International Patent No. WO 00/09679—which is herein incorporate by reference in its entirety—describes thermocycling ligation to recombine fragments of more specific and increased gene size. These methods rely on a multistep process involving a fragmentation step to generate fragments of parental genes that are further assembled to create recombined polynucleotides. Fragmentation is obtained by random treatments (e.g., DNAse I, sonication, mechanical disruption), or by controlled treatments (e.g., restriction endonucleases). These fragmentation processes do not take into account the level of homology of the parental genes.
Further general information regarding use of DNA repair systems in vitro appears in Dianov et al, Curr. Biol. 1994 (1069-1078), and in WO 97/21537, both of which are expressly incorporated herein in their entireties.
Definitions
In vitro, as used herein, refers to any location outside a living organism.
In vivo refers to any phenomena that occurs in a living organism, typically a cell.
DNA repair or polynucleotide repair refers to any processes that, in cells, protect against the genotoxic effects of DNA damage. Yet, in the present invention the repair preferably occurs in vitro.
ds means double stranded DNA.
ss means single stranded DNA.
Polynucleotide and polynucleotide sequence refer to any nucleic or ribonucleic acid sequence, including mRNA. A polynucleotide may be a gene or a portion of a gene. Gene refers to a polynucleotide or portion thereof associated with a known or unknown biological function or activity. Thus genes include coding sequences, regulatory sequences and recognition sequences. A gene can be obtained in different ways, including extraction from a nucleic acid source, chemical synthesis and synthesis by polymerization.
Homologous polynucleotides differ from each other at least at one corresponding residue position. Thus, as used herein, homologous encompasses what is sometimes referred to as partially heterologous. The homology, e.g., among the parental polynucleotides, may range from 20 to 99.99%, preferably 30 to 90, more preferably 40 to 80%. In some embodiments the term homologous may describe sequences that are, for example, only about 20-45% identical at corresponding residue positions. Homologous sequences may or may not share with each other a common ancestry or evolutionary origin.
Heteroduplex polynucleotides are double-stranded polynucleotides in which the two strands are not perfectly complementary to each other.
The phrase “at least two homologous heteroduplex polynucleotides” refers to a plurality of double-stranded polynucleotides, wherein a strand of each double-stranded polynucleotide is not only imperfectly complementary to its opposed strand, but also differs from the corresponding strand of one of the other double-stranded polynucleotides at least at one corresponding residue position. In other words, the heteroduplex polynucleotides are homologous to each other.
Parental polynucleotide and parent are interchangeable synonyms that refer to the polynucleotides that are fragmented to create donor fragments. In the present invention, the parental polynucleotides are generally homologous heteroduplex polynucleotides. Parental polynucleotides are often derived from genes. Recombined polynucleotide, mutant polynucleotide, chimeric polynucleotide and chimera generally refer to the polynucleotides that are generated by the method. However, these terms may refer to other chimeric polynucleotides, such as chimeric polynucleotides in the initial library. Reference sequence refers to a polynucleotide, often from a gene, having desired properties or properties close to those desired, and which is used as a target or benchmark for creating or evaluating other polynucleotides.
Polynucleotide library and DNA library refer to a group, pool or bank of polynucleotides containing at least two homologous polynucleotides, particularly homologous heteroduplex polynucleotides. A polynucleotide library may comprise either an initial library or a screening library. Initial library, initial polynucleotide library, initial DNA library, parental library and start library refer to a group, pool or bank of polynucleotides or fragments thereof containing at least two homologous parental polynucleotides or fragments thereof. The initial library may comprise genomic or complex DNA and include introns. It may also comprise sequences generated by prior rounds of shuffling. Similarly, a screening library or other limited library of recombinant polynucleotides or fragments may serve as and be referred to as an initial library. Screening library refers to the polynucleotide library that contains chimeras generated by the inventive process or another recombinant process.
Residue refers to an individual base, nucleotide or ribonucleotide, rather than to multiple bases, nucleotides or ribonucleotides. Residue may refer to a free residue that is not part of a polynucleotide or fragment, or to a single residue that forms a part of a polynucleotide or fragment.
Donor fragments and fragments generally refer to the fragmented portions of parental polynucleotides. Fragments may also refer to supplemental or substitute fragments that are added to the reaction mixture and/or that derive from a source other than fragmentation of the parental polynucleotides. Most or all of the fragments should be shorter than the parental polynucleotides. Most or all of the fragments are shorter than the assembly templates. As used herein, the donor fragments preferably do not initiate polymerase extension, i.e., they are not primers.
Nonrandom and controlled, as used herein, refer broadly to the control or predictability, e.g., over the rate or location of recombination, achieved via the template and/or ligation-orientation of the invention. Nonrandom and controlled may also refer more specifically to techniques of fragmenting polynucleotides that enable some control or predictability over the size or sequence of the resulting fragments. For example, using restriction enzymes to cut the polynucleotides provides some control over the characteristics of the fragments. Note that the invention may still be considered nonrandom when it employs random fragmentation (typically by DNase I digestion). In such cases, the assembly template, repair mechanisms and other features of the invention still provide a degree of control. In preferred embodiments, however, the fragmentation is nonrandom or controlled.
Assembly template and template refer to a polynucleotide used as a scaffold or matrix upon which fragments may anneal or hybridize to form a partiality or fully double-stranded polynucleotide. The assembly templates of the invention are to be distinguished from various sequences in the art that have been referred to as templates. For example, the templates of the present invention do not include overlapping donor fragments that facilitate the extension of complimentary donor fragments hybridized thereto. As such, the assembly template is distinct from the donor fragments at some point in the process. The assembly templates of the present invention also do not include those sequences used in processes that rely heavily on polymerase extension to generate all or most of the opposing strand. In other words, the shuffling embodiment of the invention relies on hybridization of donor fragments to from the brunt of the recombinant strand. Preferably, the template strand of the recombinant polynucleotide formed by the process, although it may itself be recombinant, does not undergo recombination during the process. In other words, preferably no donor fragments are incorporated into the template strand during a cycle of the process. The template may be synthetic, result from shuffling or other artificial processes, or it may exist in nature. Transient template refers to a template that is not itself incorporated-into the final recombinant polynucleotides. This transience is caused by separation or disintegration of the template strand of the nonfinal recombinant polynucleotide generated during the method. The template may derive from the reference sequence, the initial library, the screening library or elsewhere. Although the template may comprise or derive from a parental polynucleotide of the initial library, in a preferred embodiment the template is devised, and a polynucleotide does not qualify as a devised template if it enters the shuffling process accidentally, e.g., by somehow slipping into the hybridization step without being fragmented. In other words, a devised template is not entirely random or accidental. Rather, at least to some extent a devised template is directly or indirectly obtained for use as a template by a human being, or a computer operated thereby, via purposeful planning, conception, formulation, creation, derivation and/or selection of either a specific desired polynucleotide sequence(s) or a sequence(s) from a source(s) that is likely to contain a desired sequence(s).
Parental template refers to the strand of a parental polynucleotide or heteroduplex that is generally unaffected by the polynucleotide repair system, for example, because it is methylated. As such, parental template reflects the popular usage of the term template, in contrast to the more specialized meaning of assembly template as described above.
Solitary-stranded or non-identical is used to describe a population of single-stranded sequences that do not complement each other because they are all from the same strand, either sense or antisense, of one polynucleotide or multiple homologous polynucleotides. In other words, sequences from the opposing complementary strands are absent, so the population contains no sequences that are complementary to each other. For example, the population of non-identical fragments may consist of fragments of the top strands of the parental polynucleotides, whereas the population of non-identical templates may consist of bottom strands of one or more of the parental polynucleotides.
Ligation refers to creation of a phosphodiester bond between two residues.
Nick refers to the absence of a phosphodiester bond between two residues that are hybridized to the same strand of a polynucleotide. Nick includes the absence of phosphodiester bonds caused by DNases or other enzymes, as well as the absences of bonds between adjacently hybridized fragments that have simply not been ligated. As used herein, nick does not encompass residue gaps.
Gap and residue gap, as used herein, refer to the absence of one or more residues on a strand of a partially double-stranded polynucleotide. In some embodiments of the invention, short gaps (less than approximately 15-50 residues) are filled in by polymerases and/or flap trimming. Long gaps are conventionally filled in by polymerases.
Hybridization has its common meaning except that it may encompass any necessary cycles of denaturing and re-hybridization.
Adjacent fragments refer to hybridized fragments whose ends are flush against each other and separated only by nicks, not by gaps.
Ligation-only refers to embodiments of the invention that do not utilize or require any gap filling, polymerase extension or flap trimming. In ligation-only embodiments, all of the fragments hybridize adjacently. Note that embodiments that are not ligation-only embodiments still use ligation.
As used herein, ligation-oriented, oriented ligation and ligation-compatible generally represent or refer to a template-mediated process that enables ligation of fragments or residues in a relatively set or relatively predictable order. In ligation-only embodiments, the method employs no gap filling techniques and instead relies on ligation of adjacent fragments, often achieved after multiple hybridization events.
As used herein, exonuclease-mediated generally refers to a template-mediated process that employs flap trimming to enable ligation of fragments or residues in a relatively set or relatively predictable order.