The present invention relates to molecular shuffling, and to splicing of nucleic acids and proteins.
Nucleic acid shuffling provides for the rapid evolution of nucleic acids, in vitro and in vivo. Rapid evolution provides for the commercial production of encoded molecules (e.g., nucleic acids and proteins) with new and/or improved properties. Proteins and nucleic acids of industrial, agricultural and therapeutic value can be created or improved through shuffling procedures. A number of publications by the inventors and their co-workers describe nucleic acid shuffling and applications of this technology. For example, Stemmer et al. (1994) xe2x80x9cRapid Evolution of a Proteinxe2x80x9d Nature 370:389-391; Stemmer (1994) xe2x80x9cDNA Shuffling by Random Fragmentation and Reassembly: in vitro Recombination for Molecular Evolution,xe2x80x9d Proc. Natl. Acad. USA 91:10747-10751; Stemmer U.S. Pat. No. 5,603,793 METHODS FOR IN VITRO RECOMBINATION; Stemmer et al. U.S. Pat. No. 5,830,721 DNA MUTAGENESIS BY RANDOM FRAGMENTATION AND REASSEMBLY; Stemmer et al., U.S. Pat. No. 5,811,238 METHODS FOR GENERATING POLYNUCLEOTIDES HAVING DESIRED CHARACTERISTICS BY ITERATIVE SELECTION AND RECOMBINATION describe, e.g., in vivo and in vitro nucleic acid, DNA and protein shuffling in a variety of formats, e.g., by repeated cycles of mutagenesis, shuffling and selection, as well as methods of generating libraries of displayed peptides and antibodies.
Applications of DNA shuffling technology have also been developed by the inventors and their co-workers. In addition to the publications noted above, Minshull et al., U.S. Pat. No. 5,837,458 METHODS AND COMPOSITIONS FOR CELLULAR AND METABOLIC ENGINEERING provides for the evolution of metabolic pathways and the enhancement of bioprocessing through recursive shuffling techniques. Crameri et al. (1996), xe2x80x9cConstruction And Evolution Of Antibody-Phage Libraries By DNA Shufflingxe2x80x9d Nature Medicine 2(1): 100-103 describe, e.g., antibody shuffling for antibody phage libraries. Additional details regarding DNA Shuffling can also be found in WO95/22625, WO97/20078, WO96/33207, WO97/33957, WO98/27230, WO97/35966, WO98/31837, WO98/13487, WO98/13485 and WO989/42832.
Physical nucleic acid shuffling techniques (as opposed, e.g., to xe2x80x9cin silicoxe2x80x9d methods which are performed, at least in part, by manipulation of character strings in a computer) rely upon actual recombination between physical nucleic acids, whether the format is an in vitro or an in vivo format. Recombination occurs at a relatively high frequency, e.g., where there are complementary nucleic acids between strands to be recombined. Thus, nucleic acids to be recombined are typically e.g., about 70% identical/complementary in sequence over regions of, e.g., about 30-40 nucleotides. It would be desirable to be able to recombine low homology, or even non-homologous sequences, thereby increasing access to the potential sequence space encoded by recombinant nucleic acids resulting from shuffling methods. For example, for proteins which are commercially valuable, it would be desirable to be able to gain access to a recombination/mutation spectrum which is different than that of the native protein to provide for greater diversity in products produced by the various available shuffling strategies.
Similarly, nucleic acid recombination generally can be difficult to modulate, resulting in regions of high or low crossover frequency between two different targets for recombination. The crossover frequency for a particular pairing of sequences on two different targets is one feature that mediates the recombinant nucleic acids that result from recombination methods. Improved methods of modulating the recombination frequency at potential recombination sites would be desirable to weight/bias recombination product outcomes.
In general, new techniques which facilitate, improve or add levels of control to recombination methods are highly desirable. In particular, techniques which permit shuffling of divergent nucleic acids, or which provide for modulation and tuning of shuffling rates are desirable. The present invention provides such significant new recombination protocols, as well as other features which will be apparent upon complete review of this disclosure.
The present invention provides a number of new nucleic acid recombination formats for nucleic acid shuffling. In the methods, a number of insertion sequences are inserted into one or more parental nucleic acid to provide a modified target nucleic acid substrate for recombination and subsequent mutation. The number, type and placement of such insertion sequences provides for the ability to shuffle nucleic acids with little or no homology other than the insertion sequences. In addition, these insertion sequences provide for the ability to modulate or xe2x80x9ctunexe2x80x9d recombination frequencies between target nucleic acids. The methods typically take advantage of self-splicing, trans-splicing or use cellular machinery to remove the insertion sequences from final coded nucleic acids or proteins, e.g., where the insertion sequences are introns, inteins, proteolyzed polypeptide sequences or the like. The insertion sequences can also comprise markers, molecular tags, or the like, e.g., for purification of encoded molecules or can serve to allow for expression of otherwise toxic proteins (e.g., RNases, Dnases, restriction enzymes, proteases, lipases, recombinases, ligases, polymerases, etc.) e.g., in a form where an intein is excised in vivo. Similarly, in vitro expression of insertion modified sequences can result in the production of these and other proteins in vitro, e.g., using in vitro expression systems.
Methods of shuffling two target nucleic acids (i.e., a first and a second target nucleic acid) are provided. In the methods, a first and a second target nucleic acid are provided, e.g., by cloning, PCR amplification, synthesis, isolation from an environmental source (soil, air, water, etc.), or other methods. At least one of the first and second target nucleic acids (and typically both) have a plurality of homologous or non-homologous insertion nucleic acid sequences, such as one or more intron (e.g., self-splicing bacterial, eukaryotic or trans-splicing intron), intein, subsequence removed by site specific recombination (e.g., similar to V-D-J recombination for antibody production), or the like, optionally including intron splicing enhancers or the like. The target nucleic acids are recombined, producing a shuffled recombinant nucleic acid.
In addition to providing for new recombination methods per se, the invention also provides methods of producing selected proteins and RNAs, for any of the purposes that such proteins and RNAs are ordinarily produced. For example, in one aspect, a first shuffled nucleic acid subsequence encoding a first portion of the selected protein and a second nucleic acid subsequence encoding a second portion of the selected protein is provided. The nucleic acids can be on the same strand (as in cis-mediated reactions) or on different strands (as in trans mediated reactions). The first and second subsequences are expressed to produce a first protein subsequence and a second protein subsequence, which are spliced to produce the selected protein. Commonly, more than two subsequences are spliced, e.g., 3, 4, 5, 6, 7, 8, 9, 10 or more sequences, as set forth herein. The splicing reaction can be in cis or in trans (or both) and can be in viro or in vivo (or both). Splicing can occur by spontaneous or controlled mechanisms.
Similarly, in RNA production methods, a first shuffled nucleic acid subsequence encoding a first portion of the selected RNA is provided and a second nucleic acid subsequence encoding a second portion of the selected RNA is also provided. Again, these subsequences can be on the same or on different molecules (depending on whether cis or trans splicing is employed). The first and second nucleic acid subsequences, or RNA copies thereof, are spliced to produce the selected RNA, which can encode a useful RNA (e.g., an antisense, or sense molecule or ribozyme) or the RNA can encode a protein. The intein and RNA shuffling/production methods are combinable, i.e., the spliced RNA molecules can encode intein-extein sequences which are spliced at the protein level to produce a useful protein.
In general, a parental nucleic acid can be broken into several exons or exteins by incorporation of a number of introns or inteins into the sequence of the parental nucleic acid. For example, the target nucleic acid resulting from incorporation of insertion sequences into the parental nucleic acid can have, e.g., about 5, 10, 15, 20, 30, 50, 100 or more xe2x80x9cmini exons,xe2x80x9d or xe2x80x9cmini exteinsxe2x80x9d separated by a corresponding number of insertion sequences.
In shuffling reactions, first and second target nucleic acids are optionally derived from a first and second parental nucleic acid which are sufficiently different in sequence that they do not substantially hybridize in solution. For example, the first and second target nucleic acids can be derived by integration of a plurality of insertion sequences into the first and second parental nucleic acid. The first and second parental nucleic acid can be, e.g., less than 50%, or less than e.g., 40%, or less than e.g., 30%, or less than e.g., 25%, or less than e.g., 15% identical over the full length of the first and second parental nucleic acid, when the first and second nucleic acids are aligned for maximum identity.
The insertion nucleic acid sequences can modulate a recombination frequency between the first and second target nucleic acid. For example, by placing an intron into a parental sequence, the recombination efficiency of nucleic acid subsequences to either side of the inton can be decreased. Similarly, placing homologous mini introns within the parental sequences provides sites for recombination within the resulting targets, e.g., where the targets display regions of low similarity in non-intronic sequences.
Insertion sequences can also modulate expression in one or more cell type, e.g., where the insertion sequences comprise one or more enhancer or other regulatory sequence. Similarly, insertion sequences optionally comprise splicing enhancer sequences (e.g., ISEs, such as the chicken cardiac troponin T (cTNT) ISE) to facilitate splicing.
Essentially any nucleic acid can be a parental nucleic acid with which insertion sequences can be combined to produce a target nucleic acid for splicing. Example sequences include parental nucleic acids corresponding a gene or cDNA encoding EPO, a gene or cDNA encoding an insulin protein, a gene or cDNA encoding a peptide hormone, a gene or cDNA encoding a cytokine, a gene or cDNA encoding an epidermal growth factor, a gene or cDNA encoding a fibroblast growth factor, a gene or cDNA encoding a hepatocyte growth factor, a gene or cDNA encoding insulin-like growth factor, a gene or cDNA encoding an interferon, a gene or cDNA encoding an interleukin, a gene or cDNA encoding a keratinocyte growth factor, a gene or cDNA encoding a leukemia inhibitory factor, a gene or cDNA encoding oncostatin M, a gene or cDNA encoding PD-ECSF, a gene or cDNA encoding PDGF, a gene or cDNA encoding pleiotropin, a gene or cDNA encoding SCF, a gene or cDNA encoding c-kit ligand, a gene or cDNA encoding VEGF, a gene or cDNA encoding G-CSF, a gene or cDNA encoding an oncogene, a gene or CDNA encoding a tumor suppressor, a gene or cDNA encoding a steroid hormone receptor, a gene or cDNA encoding a plant hormone, a gene or cDNA encoding a disease resistance gene, a gene or cDNA encoding an herbicide resistance gene, a gene or cDNA encoding a bacterial gene, a gene or cDNA encoding a monooxygenase, a gene or cDNA encoding a protease, a gene or cDNA encoding a nuclease, an antibody, a peptide ligand, an angiogenisis inhibitor, a gene or cDNA encoding a lipase, a gene or cDNA encoding a C-X-C chemokine, a gene or cDNA encoding a C-C chemokine, a gene or cDNA encoding an antibody V gene, a gene or cDNA encoding a cystein knot protein such as TGFxcex2, NGF, PDGFxcex2 or the like, a gene or cDNA encoding a TNKor family member, a gene or cDNA encoding CNTF, a gene or CDNA encoding 4F, and/or gene or cDNA encoding an RNase.
The methods herein are amenable to both physical recombination of nucleic acids and to virtual or xe2x80x9cin silicoxe2x80x9d recombination of character strings representing nucleic acids, e.g., in a computer. Following complete or partial sequence recombination in silico, target nucleic acids, or nucleic acids derived from the target nucleic acids can be synthesized. Such synthetic nucleic acids can be recombined, cloned, selected or otherwise manipulated in the same manner as any other nucleic acid.
A variety of techniques can be used to produce target nucleic acids comprising insertion sequences. Such methods include chemical synthesis, PCR concatemerization, in silico character string formation or generation, and the like. For example, in one embodiment, insertion of the plurality of insertion nucleic acid sequences into one or more of the first and second parental nucleic acid sequences is performed by physically joining a plurality of subsequences of the first or second parental nucleic acid sequences to the plurality of insertion nucleic acid sequences.
As noted, the addition of insertion sequences to parental nucleic acids can modify or modulate the recombination of resulting target nucleic acids. Similarly, the addition of insertion sequences can alter the hybridization properties of resulting target sequences. For example, even non-homologous parental nucleic acids can be made to hybridize by the addition of a sufficient number and appropriate arrangement of insertion sequences. Similarly, a target nucleic acid derived from a parental sequence can be made which does not hybridize under a selected set of conditions (e.g., stringent hybridization conditions) to the parental nucleic acid. As noted above, such insertion sequences can be used to tune recombination rates between selected regions of a target nucleic acid, e.g., where a particular region is targeted for an increased or decreased recombination rate.
The target and parental nucleic acids can have dramatically different hybridization properties as a result of the insertion sequences being present in the target nucleic acids. The target nucleic acids can be prevented from hybridizing to the parents by inclusion of the target sequences, or, conversely, one or more target sequence can even be made to hybridize to one or more parent, thereby controlling the recombination properties of resulting nucleic acid shuffling reactions. Thus, in one embodiment, the first and second parental nucleic acid sequences hybridize under stringent conditions, and the first and second target nucleic acids do not hybridize under stringent conditions. Similarly, in another embodiment, the first and second parental nucleic acid sequences do not hybridize under stringent conditions, while the first and second target nucleic acids hybridize under stringent conditions. In yet another embodiment, the first and second nucleic target nucleic acid hybridize under stringent conditions, while the first target nucleic acid does not hybridize under stringent conditions to the second parental nucleic acid, or wherein the second target nucleic acid does not hybridize under stringent conditions to the first parental nucleic acid. Similarly, in one embodiment, the first or second parental nucleic acid hybridizes to a third nucleic acid under stringent conditions, where the first and second target nucleic acids do not hybridize under stringent conditions to the third nucleic acid. A variety of other modifications in hybridization due to the number and arrangement of insertion sequences will be apparent upon complete review.
Recombinant nucleic acids generated by recombining nucleic acid sequences comprising insertion subsequences can, of course, be recombined or shuffled, cloned, amplified, expressed in vivo or in vitro, synthesized, or otherwise modified using any available naturally mediated or laboratory-mediated technique. For example, in one embodiment, a shuffled recombinant nucleic acid made by recombining one or more target nucleic acid comprising a plurality of insertion sequences with one or more additional nucleic acid(s) is recombined with a third nucleic acid. The resulting secondary shuffled recombinant nucleic acid can be selected for a desired trait or property using any available selection method. In general, any recombinant nucleic acid can be selected for a desired trait or property.
Recombinant nucleic acids are also optionally expressed in a cell or in vitro, thereby producing a nucleic acid or protein. In one embodiment, the expressed protein can comprise intein and extein sequences. Typically, the intein (some times referred to as an xe2x80x9cintervening protein sequencexe2x80x9d) is excised from an expressed protein sequences. Concomitantly, the ligation of the flanking sequences (exteins) form a mature xe2x80x9cextein proteinxe2x80x9d which is, optionally, active in one or more cell or in one or more in vitro reaction or system. Thus, expressed proteins can be proteolytically cleaved and ligated to produce an active protein, and/or to remove an intein from an expressed protein. This ligation reaction can occur in both cis- and trans-splicing reaction formats. Reactions occur in vitro or in vivo for cis or trans splicing inteins. For additional details regarding trans splicing of introns and inteins, see, Patten et al. xe2x80x9cENCRYPTION OF TRAITS USING SPLIT GENE SEQUENCES AND ENGINEERED GENETIC ELEMENTSxe2x80x9d U.S. Ser. No. 60/164,618 Filed Nov. 10, 1999.
The presence of insertion sequences can be used to modulate recombination rates between regions of nucleic acids. For example, the cross over frequency between two points on a first and second target nucleic acids can typically be increased by placing insertion sequences between the two points. This is desirable, e.g., where low linkage rates between regions of nucleic acids to be recombined are desired, e.g., where one wishes to separately evolve different functional domains or elements of the nucleic acid.
Recombinant nucleic acids can be modified by removal of insertion sequences to improve expression or facilitate cloning of any final product. For example, where a nucleic acid encodes a plurality of intronic insertion sequences, the encoded mRNA can be reverse transcribed and the resulting cDNA cloned or otherwise manipulated. It should be noted that this process can result in a cDNA which does not hybridize to the recombinant nucleic acid comprising the introns. Indeed, the cDNA can be the result of several rounds of selection and recombination, resulting in a cDNA with a highly unique sequence which does not hybridize under e.g., stringent conditions, to any previously known sequence. Thus, sequence space which is inaccesible between two known nucleic acids is accessible by this procedure, resulting in recombinant products that could not otherwise be obtained.
The final product produced by any of the procedures herein can be a DNA (e.g., a genomic DNA, an artificial DNA, a cDNA, or the like), an RNA, an mRNA, a viral RNA, a sn RNA, a tRNA, an rRNA, a gRNA, a protein, a proteolytically cleaved protein, a protein fragment, a spliced protein or any other molcule that can be encoded by a nucleic acid, including e.g., metabolic products and the like. As noted, target sequences can comprise homologous or non homologous nucleic acid subsequences which can be separated by homologous or non homologous insertion sequences. The target nucleic acids to be recombined can be homologous relative to each other, or comprise homologous and non-homologous sequences relative to each other. The nucleic acids can be present in vectors such as expression vectors, or can be free in solution.
The nucleic acids to be recombined can be present in recombination mixtures. For example, one recombination mixture of the invention includes a first target nucleic acid comprising a plurality of insertion subsequences. Typically, the mixture also includes a second target nucleic acid having at least one region of sequence similarity to the first nucleic acid. The second target nucleic acid typically also includes a plurality of insertion subsequences.
In one format, a recombination mixture resulting from fragmenting a first target nucleic acid comprising a plurality of insertion subsequences, and a second target nucleic acid comprising at least one region of sequence similarity to the first target nucleic acid is provided. For example, the first and second target nucleic acids can be fragmented with a DNase, or, e.g., cleaved chemically to produce nucleic acid fragments. Similarly, the first and second target nucleic acids can be xe2x80x9cfragmentedxe2x80x9d by chemically synthesizing fragments of the first and second target nucleic acid.
Recombinant nucleic acids produced by recombining the recombination mixtures of the invention are also provided. For example, the first or second nucleic acid can include one or more subsequence corresponding to one or more subsequence from one or more gene or cDNA such as a gene or cDNA encoding EPO, a gene or cDNA encoding an insulin protein, a gene or cDNA encoding a peptide hormone, a gene or cDNA encoding a cytokine, a gene or cDNA encoding an epidermal growth factor, a gene or cDNA encoding a fibroblast growth factor, a gene or cDNA encoding a hepatocyte growth factor, a gene or cDNA encoding insulin-like growth factor, a gene or cDNA encoding an interferon, a gene or cDNA encoding an interleukin, a gene or cDNA encoding a keratinocyte growth factor, a gene or cDNA encoding a leukemia inhibitory factor, a gene or cDNA encoding oncostatin M, a gene or cDNA encoding PD-ECSF, a gene or cDNA encoding PDGF, a gene or cDNA encoding pleiotropin, a gene or cDNA encoding SCF, a gene or cDNA encoding c-kit ligand, a gene or cDNA encoding VEGF, a gene or cDNA encoding G-CSF, a gene or cDNA encoding an oncogene, a gene or cDNA encoding a tumor suppressor, a gene or cDNA encoding a steroid hormone receptor, a gene or cDNA encoding a plant hormone, a gene or cDNA encoding a disease resistance gene, a gene or cDNA encoding an herbicide resistance gene, a gene or cDNA encoding a bacterial gene, a gene or cDNA encoding a monooxygenase, a gene or cDNA encoding a protease, a gene or cDNA encoding a nuclease, a gene or cDNA encoding an RNase, and/or a gene or cDNA encoding a lipase. Of course, many other nucleic acids/proteins can be made or modified by the methods herein. The resulting recombinant nucleic acid can also comprise activities and subsequences which correspond to these nucleic acids.
In one aspect, the invention provides methods of recombining a plurality of sequence domains from a plurality of homologous or non-homologous nucleic acid sequences. In the methods, a pre-mRNA comprising a plurality of sequence domains is provided which correspond to a plurality of different parental nucleic acid sequences. The pre-mRNA is alternatively spliced to produce a plurality of different mRNAs comprising a plurality of different sets of sequence domains. Typically, the pre-mRNA has between about 6 and about 20 exons or exteins, e.g., where the pre-mRNA has a plurality of mini exons or exteins. Most typically, the plurality of different mRNAs are selected for a desired trait or property. Optionally, the methods include cloning one or more of the plurality of different mRNAs.
In this alternative splicing/recombination strategy, the methods typically include recombining one or more of: the plurality of different mRNAs, the pre-mRNA, a DNA encoding the mRNA, and a DNA encoding the pre-mRNA, with one or more additional nucleic acid.
In one embodiment, the pre-mRNA is provided to a cell by transducing or transfecting the cell with a vector comprising a DNA encoding the pre-mRNA. As discussed throughout, in vitro formats are also available.
The present invention also provides methods of making a nucleic acid with a desired splicing phenotype. In the methods, a plurality of homologous nucleic acids are provided, each comprising a plurality of insertion nucleic acid sequences. The plurality of homologous nucleic acids are recombined to produce a library of recombinant nucleic acids, which are selected for production of a desired or selected mRNA or protein (or product thereof) when the selected recombinant nucleic acid is expressed in vitro or in a cell. As with any nucleic acid noted above, this selected nucleic acid is optionally recombined with an additional nucleic acid and the resulting secondary recombinant nucleic acid selected for production of a desired mRNA or protein (or product thereof).
The nucleic acids noted above which include insertion sequences will typically comprise as many as 10 insertion sequences and as many as 10 flanking sequences (e.g., exons or exteins) or more. Insertion nucleic acid sequences include those derived from bacterial introns, eukaryotic introns and archaebacterial introns, as well as bacterial inteins, eukaryotic inteins and archaebacterial inteins. The nucleic acids are recombined in vitro or vivo.
The present invention also provides apparatus, integrated systems and kits for practicing the methods herein, e.g., comprising use of the recombination mixtures herein, containers, instruction sets for practicing the methods herein, and the like.