Pursuant to 37 C.F.R. 1.71(e), Applicants note that a portion of this disclosure contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention relates to methods of providing shuffling libraries that include codon-varied oligonucleotide sequences. Codon-varied oligonucleotides can be synthesized using trinucleotide or mononucleotide phosphoramidite sequences, and can be derived from homologous or non-homologous nucleic acid sequences, or combinations of such sequences. In turn, codon-varied oligonucleotide sequences can be utilized for recombination in various methods of artificial evolution.
The use of trinucleotide phosphoramidites in solid-phase DNA synthesis was previously thought to be unfeasible, as only marginal yields could be achieved. Sondek, J. and Shortle, D. (1992) J. Immunol., 149, 3903-3913. These poor results were attributed to the steric bulk of the trinucleotide molecules. Id. However, it has since been shown that trinucleotide phosphoramidites representing codons for all 20 amino acids can be successfully used to introduce entire codons into oligonucleotides in automated, solid-phase DNA synthesis and thus can function as excellent reagents for the synthesis of mixed oligonucleotides for random mutagenesis. Virnekxc3xa4s, B., et al., (1994) Nucleic Acids Res., 22, 5600-5607. Other references involving the synthesis of trinucleotide phoshoramidites, their subsequent use in oligonucleotide synthesis, and related issues are described in, e.g., Kayushin, A. L. et al., (1996) Nucleic Acids Res., 24, 3748-3755, Huse, U.S. Pat. No. 5,264,563 xe2x80x9cPROCESS FOR SYNTHESIZING OLIGONUCLEOTIDES WITH RANDOM CODONS,xe2x80x9d Lyttle et al., U.S. Pat. No. 5,717,085 xe2x80x9cPROCESS FOR PREPARING CODON AMIDITESxe2x80x9d Shortle et al., U.S. Pat. No. 5,869,644 xe2x80x9cSYNTHESIS OF DIVERSE AND USEFUL COLLECTIONS OF OLIGONUCLEOTIDES,xe2x80x9d Greyson, U.S. Pat. No. 5,789,577 xe2x80x9cMETHOD FOR THE CONTROLLED SYNTHESIS OF POLYNUCLEOTIDE MIXTURES WHICH ENCODE DESIRED MIXTURES OF PEPTIDES,xe2x80x9d and Huse, WO 92/06176 xe2x80x9cSURFACE EXPRESSION LIBRARIES OF RANDOMIZED PEPTIDES.xe2x80x9d
The inventors and their co-workers have developed various rapid artificial evolution techniques for creating improved industrial, agricultural, and therapeutic genes and encoded proteins including via oligonucleotide-mediated recombination. These methodologies and related aspects are described in a variety of sources, e.g., Stemmer et al., (1994) xe2x80x9cRapid Evolution of a Proteinxe2x80x9d Nature 370:389-391, Stemmer (1994) xe2x80x9cDNA Shuffling by Random Fragmentation and Reassembly: in vitro Recombination for Molecular Evolution,xe2x80x9d Proc. Natl. Acad. USA 91:10747-10751, Crameri et al., (1996), xe2x80x9cConstruction And Evolution Of Antibody-Phage Libraries By DNA Shufflingxe2x80x9d Nature Medicine 2(1):100-103, Stemmer U.S. Pat. No. 5,603,793 xe2x80x9cMETHODS FOR IN VITRO RECOMBINATION,xe2x80x9d Stemmer et al., U.S. Pat. No. 5,830,721 xe2x80x9cDNA MUTAGENESIS BY RANDOM FRAGMENTATION AND REASSEMBLY,xe2x80x9d Stemmer et al., U.S. Pat. No. 5,811,238 xe2x80x9cMETHODS FOR GENERATING POLYNUCLEOTIDES HAVING DESIRED CHARACTERISTICS BY ITERATIVE SELECTION AND RECOMBINATION,xe2x80x9d Stemmer et al., (1998) U.S. Pat. No. 5,834,252 xe2x80x9cEnd Complementary Polymerase Reaction,xe2x80x9d Minshull et al., U.S. Pat. No. 5,837,458 xe2x80x9cMethods and Compositions for Cellular and Metabolic Engineering,xe2x80x9d and U.S. Provisional Patent Applications, Ser. Nos. 60/118,813 and 60/141,049 xe2x80x9cOligonucleotide Mediated Nucleic Acid Recombination,xe2x80x9d filed Feb. 5, 1999 and Jun. 24, 1999, respectively, each of which is incorporated by reference in its entirety for all purposes. Additional details regarding DNA shuffling can also be found in WO95/22625, WO97/20078, WO96/33207, WO97/33957, WO98/27230, WO97/35966, WO98/31837, WO98/13487, WO98/13485 and WO989/42832, each of which is also incorporated by reference in its entirety for all purposes.
Recently, the use of oligonucleotides for xe2x80x9cfamilyxe2x80x9d shuffling was described by the inventors and their co-workers in U.S. Provisional Patent Applications, Ser. Nos. 60/118,813 and 60/141,049, supra. Additional oligonucleotide shuffling methods would be desirable. The present invention provides new codon-based oligonucleotide mediated shuffling methods and related compositions, as well as a variety of additional features which will become apparent upon review of the following description.
The present invention provides recombination methodologies in which codon-varied oligonucleotides are shuffled to provide recombined nucleic acid populations. Codon-varied oligonucleotides are synthesized, e.g., utilizing codon- or trinucleotide-based phosphoramidite coupling chemistry. This approach affords extensive flexibility to shuffling processes, as codon-varied oligonucleotides can be based upon homologous or non-homologous nucleotide sequences, or even combinations of such sequences.
In a first aspect, the present invention is directed to a method of recombining codon-varied oligonucleotides. It includes synthesizing, hybridizing, and elongating a set of overlapping codon-varied oligonucleotides to provide a population of recombined nucleic acids. In one embodiment, this method can include selecting at least first and second nucleic acids to be recombined, where the set of codon-varied oligonucleotides includes a plurality of codon-varied nucleic acids which correspond to the first and second nucleic acids. The first and second nucleic acids can be homologous or non-homologous.
In one embodiment, the sythesizing step of this method is a trinucleotide synthesis format that includes providing a substrate sequence having a 5xe2x80x2 terminus and at least one base, both of which have protecting groups thereon. The 5xe2x80x2 protecting group of the substrate sequence is then removed to provide a 5xe2x80x2 deprotected substrate sequence, which is then coupled with a selected trinucleotide phosphoramidite sequence. The trinucleotide has a 3xe2x80x2 terminus, a 5xe2x80x2 terminus, and three bases, each of which has protecting groups thereon. The coupling step yields an extended oligonucleotide sequence. Thereafter, the removing and coupling steps are optionally repeated. When these steps are repeated, the extended oligonucleotide sequence yielded by each repeated coupling step becomes the substrate sequence of the next repeated removing step until a desired codon-varied oligonucleotide is obtained. This trinucleotide synthesis format can optionally include coupling together one or more of: mononucleotides, trinucleotide phosphoramidite sequences, and oligonucleotides.
The synthesizing step is optionally a xe2x80x9csplit-poolxe2x80x9d synthesis format that includes providing substrate sequences, each having a 5xe2x80x2 terminus and at least one base, both of which have protecting groups thereon. The 5xe2x80x2 protecting groups of the substrate sequences are removed to provide 5xe2x80x2 deprotected substrate sequences, which are then coupled with selected trinucleotide phosphoramidite sequences. Each trinucleotide has a 3xe2x80x2 terminus, a 5xe2x80x2 terminus, and three bases, all of which have protecting groups thereon. The coupling step yields extended oligonucleotide sequences. Thereafter, the removing and coupling steps are optionally repeated. When these steps are repeated, the extended oligonucleotide sequences yielded by each repeated coupling step become the substrate sequences of the next repeated removing step until extended intermediate oligonucleotide sequences are produced.
Additional steps of the split-pool format optionally include splitting the extended intermediate oligonucleotide sequences into two or more separate pools. After this is done, the 5xe2x80x2 protecting groups of the extended intermediate oligonucleotide sequences are removed to provide 5xe2x80x2 deprotected extended intermediate oligonucleotide sequences in the two or more separate pools. Following this, these 5xe2x80x2 deprotected intermediates are coupled with one or more selected mononucleotides, trinucleotide phosphoramidite sequences, or oligonucleotides in the two or more separate pools to yield further extended intermediate oligonucleotide sequences. In turn, these further extended sequences are pooled into a single pool. Thereafter, the steps beginning with the removal of the 5xe2x80x2 protecting groups of the substrate sequences to provide 5xe2x80x2 deprotected substrate sequences are optionally repeated. When these steps are repeated, the further extended oligonucleotide sequences, yielded by each repeated coupling step that generates those specific sequences, become the substrate sequences of the next repeated removing step that includes those specific sequences until desired codon-varied oligonucleotides are obtained.
Both synthetic protocols described, supra, can optionally be performed in an automated synthesizer that automatically performs the steps. This aspect includes inputting character string information into the automated synthesizer corresponding to the desired codon-varied oligonucleotides to be obtained, e.g., information corresponding to two or more nucleic acids to be recombined. Additionally, the protected substrate sequences of both synthetic formats can include 3xe2x80x2 ends that are covalently attached to a solid support.
The hybridization step of the method described herein can occur in vitro or in vivo. The elongation step of this method optionally includes providing a hybridized set of overlapping codon-varied oligonucleotides and extending one or more members of that hybridized set with a polymerase, e.g., a thermostable polymerase.
In one embodiment, the method of recombining codon-varied oligonucleotides optionally includes denaturing the population of recombined nucleic acids to provide denatured recombined nucleic acids. These denatured nucleic acids are then re-hybridized and in turn, elongated. In another embodiment of this method, the denaturing, re-hybridizing, and elongating steps are repeated at least once and optionally twice, three times, four times, or more. Finally, the resulting elongated re-hybridized recombined nucleic acids, from either embodiment, are selected for at least one desired trait or property.
In an additional embodiment of the method in which the denaturing, re-hybridizing, and elongating steps are repeated at least once, a plurality of members of the population of recombined nucleic acids is optionally selected for a desired trait or property to provide first round selected nucleic acids. This method optionally includes hybridizing a second set of overlapping codon-varied oligonucleotides to provide a population of further recombined nucleic acids. This method also optionally includes sequencing the first round selected nucleic acids, where the second set of overlapping codon-varied oligonucleotides is derived from the first round selected nucleic acids by aligning sequences of the first round selected nucleic acids to identify regions of identity and regions of diversity. The second set of overlapping codon-varied oligonucleotides is then synthesized to include a plurality of oligonucleotides, each of which include subsequences corresponding to at least one region of diversity. The first round selected nucleic acids encode, e.g., polypeptides of about 50 amino acids or less, or larger peptides, e.g., 60, 70, 80, 90 amino acids or more. Furthermore, the second set of overlapping codon-varied oligonucleotides optionally include a plurality of oligonucleotide member types which correspond to consensus region subsequences derived from a plurality of the first round selected nucleic acids.
In another aspect, the method of recombining codon-varied oligonucleotides optionally includes selecting at least one member of the population of recombined nucleic acids for at least one desired trait or property. Also, the set of overlapping codon-varied oligonucleotides optionally includes a plurality of oligonucleotide member types that include consensus region subsequences derived from a plurality of homologous target nucleic acids. Further, the set of overlapping codon-varied oligonucleotides, including a plurality of oligonucleotide member types, includes, alternatively, at least about 3, 5, 10, 100, 1,000 or more member types. Finally, the set of overlapping codon-varied oligonucleotides optionally includes a plurality of homologous oligonucleotide member types that are present in either approximately equimolar amounts or approximately non-equimolar amounts.
In a second aspect, the invention provides a method of recombining at least two parental nucleic acids to provide at least one recombinant nucleic acid. This method includes providing a composition comprising at least one set of fragmented parental nucleic acids corresponding to the at least two parental nucleic acids. The set of fragmented parental nucleic acids includes a plurality of overlapping codon-varied oligonucleotides. Next, the composition is hybridized to provide at least one hybridized nucleic acid. The at least one hybridized nucleic acid is then elongated to provide at least one recombinant nucleic acid that comprises at least one subsequence from each of the at least two parental nucleic acids.
The set of fragmented parental nucleic acids recombined in this method are optionally partially produced by cleaving the two parental nucleic acids with a DNase enzyme. As another alternative, at least a portion of the set of fragmented parental nucleic acids are optionally produced by partial chain elongation using a polymerase, and one or both of the parental nucleic acids used as templates for elongation of one or more hybridized polymerase primer nucleic acids. Additionally, at least a portion of the set of fragmented parental nucleic acids are optionally produced by synthesizing oligonucleotides which correspond to one or more of the at least two parental nucleic acids, which oligonucleotides include a plurality of codon-varied oligonucleotides. The at least two parental nucleic acids to be recombined by this method are optionally homologous or non-homologous.
The hybridization step of this method of recombining at least two parental nucleic acids optionally includes hybridizing at least one codon-varied oligonucleotide with at least one additional overlapping codon-varied oligonucleotide to provide the at least one hybridized nucleic acid. The hybridizing step, alternatively, includes hybridizing at least one codon-varied oligonucleotide with at least one DNase fragmented parental nucleic acid to provide the at least one hybridized nucleic acid. As a further option, the hybridizing step can include hybridizing at least one DNase fragmented parental nucleic acid with at least one additional DNase fragmented parental nucleic acid to provide the at least one hybridized nucleic acid.
In a third aspect, the present invention provides a method of recombining homologous or non-homologous nucleic acid sequences having low sequence similarity. The method includes recombining at least one set of fragmented nucleic acids with a set of cross-over codon-varied oligonucleotides, which oligonucleotides individually comprise a plurality of sequence diversity domains corresponding to a plurality of sequence diversity domains from homologous or non-homologous nucleic acids with low sequence similarity to produce a recombinant nucleic acid. The resulting recombinant nucleic acid is optionally selected for at least one desired trait or property.
This method of recombining sequences having low sequence similarity optionally includes fragmenting at least one of the homologous or non-homologous nucleic acids to provide the set of fragmented nucleic acids. The homologous or non-homologous nucleic acids are optionally fragmented with a DNase enzyme. The set of fragmented nucleic acids is also optionally provided by synthesizing a plurality of oligonucleotide fragments corresponding to at least one homologous or non-homologous nucleic acid.
A fourth aspect of this invention is a method of recombining a plurality of parental nucleic acids. This method includes ligating a set of a plurality of codon-varied oligonucleotides with a set comprising a plurality of nucleic acid sequences corresponding to a plurality of the parental nucleic acids to produce at least one recombinant nucleic acid encoding a full-length protein. The set includes at least a first oligonucleotide that is complementary to at least a first of the parental nucleic acids at a first region of sequence diversity and at least a second oligonucleotide which is complementary to at least a second of the parental nucleic acids at a second region of diversity.
Other features of this method include optionally ligating the set of a plurality of oligonucleotides with a ligase. The set of a plurality of oligonucleotides is optionally hybridized to a first parental nucleic acid and ligated with a ligase. Also, the plurality of parental nucleic acids is optionally homologous. Furthermore, the set of a plurality of oligonucleotides optionally comprises a set of overlapping codon-varied oligonucleotides. Finally, the method optionally includes hybridizing the set of a plurality of codon-varied oligonucleotides to at least one of the plurality of parental nucleic acids, elongating the oligonucleotides with a polymerase and ligating the resulting elongated oligonucleotides to produce a nucleic acid encoding a substantially full-length protein.
A fifth aspect of the invention relates to various compositions relevant to the methods described, supra, such as libraries produced by the methods, shuffling mixture compositions, and the like.
A sixth aspect of the present invention is an integrated system that optionally includes a computer or computer readable medium and character strings in a data set that represent a set of overlapping codon-varied oligonucleotides. This system optionally integrates a standard automatic synthesizer that is coupled to an output of the computer or computer readable medium. The automatic synthesizer accepts instructions from the computer or computer readable medium and those instructions, in turn, direct the synthesis of a desired set of codon-varied oligonucleotides. Additionally, the automated synthesizer system optionally integrates one or more robotic control elements for, e.g., incubating, denaturing, hybridizing, and elongating the set of oligonucleotides. This version of the integrated system optionally further includes a detector for, e.g., detecting an elongated nucleic acid.
Definitions
Unless otherwise indicated, the following definitions supplement those in the art.
A set of xe2x80x9ccodon-varied oligonucleotidesxe2x80x9d is a set of oligonucleotides, similar in sequence but with one or more base variations, where the variations corresponds to at least one encoded amino acid difference. The oligonucleotides are synthesized utilizing trinucleotide, i.e., codon-based coupling chemistry. Codon-varied oligonucleotide sequences can be based upon sequences of a selected set of homologous nucleic acids, where the oligonucleotide sequences can include regions of sequence identity and regions of sequence diversity with one or more of those homologous nucleic acids. Aside from being based upon homologous nucleic acid sequences, codon-varied oligonucleotide sequences can also be derived from non-homologous nucleic acids, or a combination of homologous and non-homologous sequences. xe2x80x9cSetsxe2x80x9d include a plurality of different members, e.g., 2, 3, 4, 5, 10, 20, 50, 100, 1,000 or more different members.
A xe2x80x9cconsensus regionxe2x80x9d sequence or subsequence is a region of a polynucleotide having a generalized sequence in which each nucleotide position represents the base most often found in actual sequence comparisons between homologous nucleic acids.
Two nucleic acids xe2x80x9ccorrespondxe2x80x9d when they have identical or complementary sequences, when one nucleic acid is a subsequence of the other, or when one sequence is derived naturally or artificially from the other.
A xe2x80x9ccross-overxe2x80x9d codon-varied oligonucleotide has regions of sequence identity with at least two members of a selected set of nucleic acids that are either homologous or non-homologous.
A xe2x80x9cDNase enzymexe2x80x9d is an enzyme that catalyzes the cleavage of DNA, in vitro or in vivo. Many varieties of DNase enzymes are well characterized, e.g., in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, Calif.; Sambrook et al., Molecular Cloningxe2x80x94A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y., 1989 and Current Protocols in Molecular Biology, F. M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley and Sons, Inc., (supplemented through 1998), and many are commercially available.
Nucleic acids are xe2x80x9celongatedxe2x80x9d in a reaction that incorporates additional nucleotides, or analogs thereof, into the nucleic acid sequence. The reaction is typically catalyzed by a polymerase, e.g., a DNA polymerase.
A set of xe2x80x9cfragmentedxe2x80x9d nucleic acids results from the cleavage of at least one parental nucleic acid, e.g., enzymatically or chemically, or by providing subsequences of parental sequences in any other manner, including partially elongating a complimentary sequence with a polymerase or utilizing any synthetic format.
A xe2x80x9cfull-length proteinxe2x80x9d is a protein with substantially the same sequence domains as a corresponding protein encoded by a natural gene. Such a protein can have altered sequences relative to the corresponding naturally encoded gene, e.g., due to recombination and selection, but unless specified to the contrary, is typically at least about 95% the length of the naturally encoded gene.
Two nucleotide regions have high xe2x80x9csequence similarityxe2x80x9d when one region is 90% or more identical to a second selected region when aligned for optimal correspondence. In contrast, regions of low xe2x80x9csequence similarityxe2x80x9d refers to those regions that are at most 60% identical, more preferably, 40% or less identical, when aligned for maximal correspondence. Alignment may be accomplished manually or using a common alignment algorithm, such as, e.g., BLAST (set to default parameters).
Nucleic acids are xe2x80x9chomologousxe2x80x9d when they share sequence similarity that is derived, naturally or artificially, from a common ancestral sequence. This occurs naturally as two or more descendent sequences deviate from a common ancestral sequence over time as the result of mutation and natural selection. Artificially homologous sequences may be generated in various ways. For example, a nucleic acid sequence can be synthesized de novo to yield a nucleic acid that differs in sequence from a selected parental nucleic acid sequence. Artificial homology can also be created by artificially recombining one nucleic acid sequence with another, as occurs, e.g., during cloning or chemical mutagenesis, to produce a homologous descendent nucleic acid.
It is generally assumed that the two nucleic acids have common ancestry when they demonstrate sequence similarity. However, the exact level of sequence similarity necessary to establish homology varies in the art. In general, for purposes of this disclosure, two nucleic acid sequences are deemed to be homologous when they share enough sequence identity to permit direct recombination to occur between the two sequences.
It should be noted, however, that a specific advantage of this invention is the capacity to recombine nucleic acids that are more distantly related than other methods of recombination permit. In this aspect of the invention, nucleic acid sequences that are only distantly related, or not even detectably related, can be recombined by means of cross-over codon-varied oligonucleotides which are described, supra.
Nucleic acids xe2x80x9chybridizexe2x80x9d when complementary single strands of nucleic acid pair to give a double-stranded nucleic acid sequence. Hybridization occurs due to a variety of well-characterized forces, including hydrogen bonding, solvent exclusion, and base stacking. An extensive guide to nucleic hybridization may be found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biologyxe2x80x94Hybridization with Nucleic Acid Probes, part I, chapter 2, xe2x80x9cOverview of principles of hybridization and the strategy of nucleic acid probe assays,xe2x80x9d Elsevier, N.Y.xe2x80x9d
A xe2x80x9clibraryxe2x80x9d is a set of oligonucleotides. The set can be pooled, or can be individually accessible. The oligonucleotides may comprise DNA, RNA or combinations thereof.
Nucleic acid sequences are xe2x80x9coverlappingxe2x80x9d when they possess at least one complementary subsequence.
Nucleic acids are xe2x80x9cnon-homologousxe2x80x9d when they lack shared sequence similarity with a common ancestral sequence, or when they can only be indirectly recombined utilizing oligonucleotide intermediates.
A nucleic acid xe2x80x9cdomainxe2x80x9d is a discrete nucleic acid region or subsequence. It may be conserved or not conserved between a plurality of homologous nucleic acids. Generally, a domain is specified by comparing two or more sequences, where regions of sequence diversity between sequences constitutes a xe2x80x9csequence diversity domain,xe2x80x9d while a region of similarity is a xe2x80x9csequence similarity domain.xe2x80x9d
Two nucleic acids xe2x80x9crecombinexe2x80x9d when sequences from each of the two nucleic acids are combined in a progeny nucleic acid. Two sequences are xe2x80x9cdirectlyxe2x80x9d recombined when both are substrates for recombination. Two sequences are xe2x80x9cindirectlyxe2x80x9d recombined when the sequences are recombined by means of an intermediate such as a cross-over codon-varied oligonucleotide. When two nucleic acid sequences indirectly recombine, no more than one of those sequences is an actual substrate for recombination, and in some cases, neither sequence is a substrate for recombination.
A xe2x80x9csubstrate sequencexe2x80x9d is at least one nucleotide covalently attached at its 3xe2x80x2 end to a solid support.
The term xe2x80x9ctrinucleotide phosphoramidite sequencexe2x80x9d is any codon sequence of nucleotides synthesized using standard phosporamidite chemistry. Many sources have described such synthesis, e.g., Virnekxc3xa4s, B. et al., (1994) Nucleic Acids Res., 22, 5600-5607 and Kayushin, A. L. et al., (1996) Nucleic Acids Res., 24, 3748-3755.