The present invention provides methods of encrypting traits, including, e.g., splitting genes between two parental organisms or between a host organism and a vector. The invention also relates to methods of unencrypting trait encrypted gene sequences to provide unencrypted RNAs or polypeptides. Gene sequences are unencrypted when the two parental organisms are mated, or when the vector infects the host organism by trans-splicing either the split RNAs, or the split polypeptides upon expression of the split gene sequences. The invention also includes methods of providing multiple levels of trait encryption and reliable methods of producing hybrid organisms. Additional methods include those directed at unencrypting engineered genetic elements to provide unencrypted polypeptide functions and those related to recombining non-overlapping gene sequences. Furthermore, the present invention includes integrated systems and various compositions related to the methods disclosed herein.
Intermolecular splicing is termed trans-splicing. The mechanism of splicing two independently transcribed pre-mRNAs was discovered in trypanosomes. Murphy, W. J. et al. (1986) Cell 47, 517-525 and Sutton, R. and Boothroyd, J. C. (1986) Cell 47, 527-535. Thereafter, trans-splicing was also described in other organisms, e.g., C. elegans (Krause, M. and Hirsch, D. (1987) Cell 49, 753-761, Huang, X. Y. and Hirsch, D. (1989) Proc. Nat. Acad. Sci. USA 86, 8640-8644, and Hannon, G. J. et al. (1990) Cell 61, 1247-1255), Schistosoma mansoni (Rajkovic, A., et al. (1990) Proc. Nat. Acad. Sci. USA 87, 8879-8883 and Davis, R. E. et al. (1995) J. Biol. Chem. 270, 21813-21819), and plant mitochondria (Malek, O. et al. (1997) Proc. Nat. Acad. Sci. USA 94, 553-558). Targeted trans-splicing has been demonstrated in HeLa nuclear extracts, in cultured H1299 human lung cancer cells, and in H1299 tumor bearing athymic mice. Puttaraju, M. et al. (1999) Nat. Biotech. 17, 246-252. Suggested practical applications of targeted trans-splicing are, e.g., as a means for gene therapy. Id.
Various ribozymes capable of precisely trans-splicing, either in vitro or in vivo, exon sequences into target RNA sequences have been described in, e.g., Haseloff et al., U.S. Pat. No. 5,882,907 xe2x80x9cCELL ABLATION USING TRANS-SPLICING RIBOZYMES,xe2x80x9d Haseloff et al., U.S. Pat. No. 5,874,414 xe2x80x9cTRANS-SPLICING RIBOZYMES,xe2x80x9d Haseloff et al., U.S. Pat. No. 5,866,384 xe2x80x9cCELL ABLATION USING TRANS-SPLICING RIBOZYMES,xe2x80x9d Haseloff et al., U.S. Pat. No. 5,863,774 xe2x80x9cCELL ABLATION USING TRANS-SPLICING RIBOZYMES,xe2x80x9d Haseloff et al., U.S. Pat. No. 5,849,548 xe2x80x9cCELL ABLATION USING TRANS-SPLICING RIBOZYMES,xe2x80x9d and Haseloff et al., U.S. Pat. No. 5,641,673 xe2x80x9cCELL ABLATION USING TRANS-SPLICING RIBOZYMES.xe2x80x9d Methods of ablating cells in vivo involving targeted trans-splicing to provide toxic products that generate sterile plants have also been described in, e.g., Haseloff et al., U.S. Pat. No. 5,866,384, supra. The techniques referenced above generally involve trans-splicing RNA sequences into native target RNAs.
Genetically male-sterile plants can be desirable for the production of hybrid seeds, because they avoid the need for expensive and laborious removal of, e.g., anthers from flowers to prevent self-fertilization. Transgenic methods of regenerating functionally male-sterile plants have included the development of pollen cells that are ablated specifically by the expression of fungal or bacterial ribonuclease transgenes fused to a pollen-specific promoter from the particular plant. Mariani, C. et al. (1992) Nature 357, 384-387. See also, Haseloff et al., U.S. Pat. No. 5,866,384, supra.
In addition to trans-splicing RNAs, protein trans-splicing is also known. For example, certain modified proteins have been described which include xe2x80x9ccontrollable intervening protein sequencesxe2x80x9d inserted into or adjacent to target proteins. Comb, et al. U.S. Pat. No. 5,834,247 xe2x80x9cMODIFIED PROTEINS COMPRISING CONTROLLABLE INTERVENING PROTEIN SEQUENCES OR THEIR ELEMENTS METHODS OF PRODUCING SAME AND METHODS FOR PURIFICATION OF A TARGET PROTEIN COMPRISED BY A MODIFIED PROTEIN.xe2x80x9d The inserted intervening sequences are capable of cleaving the modified protein in trans under controllable conditions, e.g., increased temperature, exposure to light, treatment with chemical reagents, etc. Furthermore, these intervening protein sequences can also be inserted into a target protein sequence so as to render the target inactive. Id. See also, Comb, et al. U.S. Pat. No. 5,496,714 xe2x80x9cMODIFICATION OF PROTEIN BY USE OF A CONTROLLABLE INTERVENING PROTEIN SEQUENCExe2x80x9d and Belfort, U.S. Pat. No. 5,795,731 xe2x80x9cINTEINS AS ANTIMICROBIAL TARGETS: GENETIC SCREENS FOR INTEIN FUNCTION.xe2x80x9d Spontaneous (native) trans-splicing of both inteins and RNAs is also known.
More generally, relevant features of inteins and intein splicing, as well as certain forms of chemical ligation of polypeptides, are described in the abundant literature on the topics, including the references noted above and, e.g.: Clarke (1994) xe2x80x9cA proposed mechanism for the self-splicing of proteinsxe2x80x9d Proc. Natl. Acad. Sci. USA 91:11084-11088; Clyman (1995) xe2x80x9cSome Microbes have splicing proteinsxe2x80x9d ASM News 61:344-347; Colston and Davis (1994) xe2x80x9cThe ins and outs of protein splicing elementsxe2x80x9d Molecular Microbiology 12, 359-363; Cooper et al. (1993) xe2x80x9cProtein splicing of the yeast TFP1 intervening protein sequence: a model for self-excisionxe2x80x9d EMBO J. 12:2575-2583; Cooper and Stevens (1993) xe2x80x9cProtein splicing: Excision of intervening sequences at the protein levelxe2x80x9d BioEssays 15, 667-673; Cooper and Stevens (1995) xe2x80x9cProtein splicing: Self-splicing of genetically mobile elements at the protein levelxe2x80x9d TIBS 20, 351-357; Cook et al. (1995) xe2x80x9cPhotochemically initiated protein splicingxe2x80x9d Angew. Chem. Int. Ed. Engel 34, 1620-1630; Dalgaard, J. (1994) xe2x80x9cMobile introns and inteins: friend or foe?xe2x80x9d Trends Genet 10, 306-7; Davis et al. (1992) xe2x80x9cProtein Splicing in the Maturation of M. Tuberculosis RecA Protein: A Mechanism for Tolerating a Novel Class of Intervening Sequencexe2x80x9d Cell 71:201-210; Davis et al. (1991) xe2x80x9cNovel Structure of the recA Locus of Mycobacterium tuberculosis Implies Processing of the Gene Productxe2x80x9d J. Bacteriol. 173:5653-5662; Davis et al. (1994) xe2x80x9cEvidence of selection for protein introns in the RecAs of pathogenic Mycobacteriaxe2x80x9d EMBO J. 13, 699-703; Davis et al. (1995) xe2x80x9cProtein splicingxe2x80x94the lengths some proteins will go toxe2x80x9d Antonie Van Leeuwenhoek 67:131-137; Doolittle, (1993) xe2x80x9cThe comings and goings of homing endonucleases and mobile intronsxe2x80x9d Proc. Natl. Acad. Sci. USA. 90:5379-5381; Doolittle and Stoltzfus (1993) xe2x80x9cGenes-in-pieces revisitedxe2x80x9d Nature 361:403; Hirata and Anraku (1992) xe2x80x9cMutations at the Putative Junction Sites of the Yeast VMA1 Protein, the Catalytic Subunit of the Vacuolar Membrane H+xe2x88x92ATPase, Inhibit its Processing by Protein Splicingxe2x80x9d Biochem. Biophys. Res. Comm. 188:40-47; Hirata et al. (1990) xe2x80x9cMolecular Structure of a Gene, VMA1, Encoding the Catalytic Subunit of H+xe2x88x92Translocating Adenosine Triphosphatase from Vacuolar Membranes of Saccharomyces cereviaiaexe2x80x9d J. Biol. Chem. 265, 6726-6733; Hodges et al. (1992) xe2x80x9cProtein splicing removes intervening sequences in an archaea DNA polymerasexe2x80x9d Nucleic Acids Res. 20:6153-6157; Kane et al. (1990) xe2x80x9cProtein Splicing Converts the Yeast TFP1 Gene Product to the 69-kD Subunit of the Vacuolar H+xe2x88x92Adenosine Triphosphatasexe2x80x9d Science 250:651-657; Koonin (1995) xe2x80x9cA protein splice-junction motif in hedgehog family proteinsxe2x80x9d Trends Biochem. Sci. 20:41-142; Kumar et al. (1996) xe2x80x9cFunctional characterization of the precursor and spliced forms of recA protein of Mycobacterium tuberculosisxe2x80x9d Biochemistry 35:1793-1802, and Kawasaki, M., et al., Biochemical and Biophysical Research Communications, vol. 222, xe2x80x9cFolding-dependent in vitro protein splicing of the Saccharomyces cerevisiae VMA1 protozymexe2x80x9d, pp. 827-832, 1996. Gimble and Thorner (1992) Nature 357:301-306; Gimble and Thorner (1993) J. Biol. Chem., 268:21844-21853; Pietrovski (1996) xe2x80x9cA new intein in cyanobacteria and its significance for the spread of inteinsxe2x80x9d Trends in Genetics 12:287-288; Shao et al. (1996) xe2x80x9cProteins splicing: Evidence for an Nxe2x80x94O acyl rearrangement as the initial step in the splicing processxe2x80x9d Biochemistry, 35:3810-3815; Shub and Goodrich-Blair (1992) Cell, 71:183-186; WO 98/49274; WO 98/49275; WO 98/40394; WO 99/11655; WO 96/34878; WO 98/28434; Kent et al. U.S. Pat. No. 5,910,437; Dawson et al. U.S. Pat. No. 5,891,993; and Jocbs et al., U.S. Pat. No. 5,981,182. Additional details on protein splicing generally can be found at the Intein Databases web site (www.neb.com/neb/inteins/intein_intro.html); and in, e.g., Nucleic Acids Research 26(7):1741-1758.
Methods of encrypting gene sequences and engineered genetic elements, and additional recombination methods would be desirable. The present invention provides new methods to encrypt traits including trans-splicing at the RNA and/or protein levels, and new methods of recombining non-overlapping gene sequences, as well as a variety of additional features which will become apparent upon complete review of the following description.
The present invention provides methods of unencrypting trait encrypted gene sequences, e.g., cDNAs, to provide unencrypted RNAs or polypeptides, e.g., full-length proteins. The methods include providing a first plurality of split gene sequences in which each split gene sequence includes a subsequence of a genetic element and transcribing the first plurality of split gene sequences to provide a plurality of RNA segments that can include trans-splicing introns. The steps of this aspect of the invention can occur either in vitro or in vivo. Two or more of the plurality of RNA segments can be trans-spliced together to provide an unencrypted RNA. The unencrypted RNA is optionally -selected for a desired trait or property, or translated to provide a second unencrypted polypeptide. The second unencrypted polypeptide is also optionally selected for a desired trait or property.
Alternately, the plurality of RNA segments can be translated to provide a plurality of polypeptide segments that can include trans-splicing inteins and two or more of that plurality can be trans-spliced together to provide a first unencrypted polypeptide. The first unencrypted polypeptide is optionally selected for at least one desired trait or property.
The first plurality of split gene sequences is optionally provided by mating a first parental organism that includes a second plurality of split gene sequences with a second parental organism that includes a third plurality of split gene sequences to produce a progeny organism. The progeny organism includes one or more of both the second and the third plurality of split gene sequences. Thereafter, one or more of the second and the third plurality of split gene sequences can be transcribed to provide a plurality of RNA segments. Additionally, the progeny organism is optionally selected for a desired trait or property, and in so doing, unencrypted RNAs are selected. The unencrypted RNAs are optionally translated to provide an unencrypted polypeptide. The unencrypted polypeptides are optionally selected for a desired trait or property. The first and second parental organisms of this aspect of the present invention can be, e.g., animals, plants, fungi, or bacteria. In certain preferred embodiments they are plants, yeast or other fungi.
A first parental organism can include a first plurality of enhancer-linked split gene sequences. Each enhancer-linked split gene sequence includes a subsequence of a genetic element with a first enhancer sequence linked thereto. The first parental organism also includes one or more first trans-acting transcription factor sequences that are unlinked to the first plurality of enhancer-linked split gene sequences. This aspect of the invention also includes a second parental organism that includes a second plurality of enhancer-linked split gene sequences in which each enhancer-linked split gene sequence includes a subsequence of the genetic element with a second enhancer sequence linked thereto. The second parental organism also includes one or more second trans-acting transcription factor sequences that are unlinked to the second plurality of enhancer-linked split gene sequences.
The two parental organisms are optionally mated to produce a progeny organism that includes the first and the second plurality of enhancer-linked split gene sequences and the first and the second trans-acting transcription factor sequences. The first and the second plurality of enhancer-linked split gene sequences can be transcribed to provide a plurality of RNA segments in which the first plurality of enhancer-linked split gene sequences are regulated by a second trans-acting transcription factor and the second plurality of enhancer-linked split gene sequences are regulated by a first trans-acting transcription factor. The progeny organism is optionally selected for a desired trait or property. Unencrypted RNAs are optionally translated to provide unencrypted polypeptides that, in turn, can be selected for a desired trait or property. Furthermore, the first and second parental organisms can be, e.g., animals, plants, fungi, or bacteria. However, in certain preferred embodiments they are plants, yeast or other fungi.
A first parental organism can include a second plurality of split gene sequences in which each split gene sequence includes a subsequence of a toxic genetic element and a second parental organism can include a third plurality of split gene sequences in which each split gene sequence also includes a subsequence of the toxic genetic element. The first and second parental organisms of this aspect of the invention can be mated and the second and third plurality of split gene sequences can be expressed in a progeny organism to produce a second and third plurality of polypeptide sequences. Thereafter, one or more of the second and third plurality of polypeptide sequences can be trans-spliced together to provide a toxic polypeptide. The toxic polypeptide, in turn, renders the progeny organism incapable of reproducing when it is male. However, the progeny organism can reproduce when it is female and when it does, the progeny organism produces hybrid progeny organisms in which the toxic genetic element is not expressed.
A toxic polypeptide can render the progeny organism incapable of reproducing when it is female. However, this progeny organism is capable of reproducing when it is male and when it does, the progeny organism produces hybrid progeny organisms in which the toxic genetic element is not expressed.
In another embodiment of the present invention, a first plurality of split gene sequences is provided by infecting a host organism that includes a second plurality of split gene sequences with a vector, e.g., a virus, that includes a third plurality of split gene sequences to produce an infected organism. The infected organism includes the second and third plurality of split gene sequences. The second and third plurality of split gene sequences can be transcribed to provide a plurality of RNA segments. Additionally, an unencrypted RNA is optionally selected for a desired trait or property, or a second unencrypted RNA can be translated to provide a second unencrypted polypeptide. The first or second unencrypted polypeptides are optionally selected for a desired trait or property.
The present invention also provides methods of unencrypting engineered genetic elements to provide unencrypted polypeptide functions that can occur in vitro or in vivo. This method includes providing a first engineered genetic element, e.g., a cDNA, which corresponds to an encoded first polypeptide, e.g., an engineered biotin ligase that is functional. It also includes providing a second engineered genetic element that corresponds to an encoded second polypeptide, e.g., an engineered biotin dependent glyphosate resistance polypeptide, that is nonfunctional in the absence of a modification performed by the first polypeptide. Thereafter, the first and second engineered genetic elements can be mixed and expressed to produce the encoded first and second polypeptides. The encoded first polypeptide then modifies the encoded second polypeptide to provide a functional encoded second polypeptide.
In an embodiment of the methods of unencrypting engineered genetic elements, the providing and mixing steps include mating a first parental organism that includes the first engineered genetic element and a second parental organism that includes the second engineered genetic element to produce a progeny organism that includes both engineered genetic elements. Thereafter, the genetic elements in the progeny organism can be expressed to produce the encoded first and second polypeptides. The first and second parental organisms of this aspect of the invention can be, e.g., animals, plants, fungi, or bacteria. In certain preferred embodiments they are plants, yeast or other fungi.
The providing and mixing steps, of the methods of unencrypting engineered genetic elements, optionally include infecting a host organism that includes the first engineered genetic element with a vector that includes the second engineered genetic element to produce an infected organism. Alternatively, the vector can include the first engineered genetic element and the host organism can include the second engineered genetic element. In either case, the infected organism ultimately includes both the first and the second engineered genetic elements. Thereafter, both engineered genetic elements can be expressed in the progeny organism to produce the encoded first and second polypeptides.
The present invention also provides a composition that includes libraries of two or more populations (e.g., homologous genetic elements) of split gene sequences. These libraries collectively include a plurality of split gene sequence member types in which combinations or subcombinations of those member types collectively correspond to one or more complete genetic elements.
The invention additionally provides a composition that includes libraries of two or more populations of enhancer-linked split gene sequences. These libraries collectively include a plurality of enhancer-linked split gene sequence member types, each regulated by a different trans-acting transcription factor in which combinations or subcombinations of the plurality of enhancer-linked split gene sequence member types collectively correspond to one or more complete genetic elements. This composition can include a trans-acting transcription factor corresponding to one of the two or more populations of enhancer-linked split gene sequences that can regulate the enhancer-linked split gene sequences of another population. This composition can also include a first trans-acting transcription factor that corresponds to a first population of enhancer-linked split gene sequences that regulates the enhancer-linked split gene sequences of a second population, and a second trans-acting transcription factor that corresponds to the second population of enhancer-linked split gene sequences that regulates the enhancer-linked split gene sequences of the first population.
The present invention also relates to methods of recombining non-overlapping gene sequences that can occur in vitro or in vivo. The methods include providing a plurality of non-overlapping gene sequences in which each non-overlapping gene sequence corresponds to a different subsequence of a genetic element. The methods also include providing a plurality of gap nucleic acid sequences in which each gap nucleic acid sequence overlaps two or more of the non-overlapping gene sequences. The non-overlapping gene sequences can be recombined with the gap nucleic acid sequences to provide recombined non-overlapping gene sequences. The recombined non-overlapping gene sequences are optionally selected for a desired trait or property and then recombined again. This process of selecting and recombining the recombined non-overlapping gene sequences can be repeated until a desired recombined genetic element is obtained. Furthermore, the plurality of non-overlapping gene sequences can be derived, e.g., from a cry3Bb gene and the plurality of gap nucleic acid sequences can be derived, e.g., from a cry1Ba, a cry1Ca, and a cry1Ia gene.
The present invention is also directed at compositions that include libraries of gap nucleic acids. The libraries of gap nucleic acids include a plurality of gap nucleic acid member types in which each gap nucleic acid member type includes subsequence identity or complementarity with at least two split gene sequence member types.
The invention additionally provides an integrated system that includes a computer or computer readable medium that includes a data set corresponding to a set of character strings. Those character strings can correspond to split gene sequences, enhancer-linked split gene sequences, trans-acting transcription factor sequences, engineered genetic elements, non-overlapping gene sequences and gap nucleic acids. The system can further include a sequence search and comparison instruction set for searching for specified nucleic acid sequences. The integrated system also optionally includes an automatic sequencer and/or synthesizer coupled to an output of the computer or computer readable medium, which can accept instructions from the computer or computer readable medium that direct the sequencing and/or synthesis of selected sequences.
The integrated system optionally includes robotic control elements for incubating, denaturing, hybridizing, and elongating a set of recombined non-overlapping gene sequences and gap nucleic acids. The system can also include a detector for detecting a nucleic acid produced by elongation of the set of recombined non-overlapping gene sequences and gap nucleic acids, or an encoded product thereof.
Unless otherwise indicated, the following definitions supplement those in the art.
A xe2x80x9csetxe2x80x9d as used herein refers to a collection of at least two molecule types.
Two nucleic acid sequences xe2x80x9ccorrespondxe2x80x9d when they have the same sequence, or when one nucleic acid sequence is a subsequence of the other, or when one sequence is derived, by natural or artificial manipulation from the other.
An xe2x80x9cunencrypted RNAxe2x80x9d is an RNA generated by trans-splicing at least two RNA segments together. An xe2x80x9cunencrypted polypeptidexe2x80x9d is a polypeptide generated by trans-splicing at least two polypeptide segments together. The term xe2x80x9cpolypeptidexe2x80x9d includes inteins, exteins, polypeptides, proteins, polyproteins, and the like.
Traits are encrypted using xe2x80x9csplit gene sequences.xe2x80x9d Split gene sequences are subsequences of a genetic element. The subsequences can be distributed, e.g., between two parental organisms, but collectively they correspond to the entire genetic element. A xe2x80x9csubsequencexe2x80x9d of a genetic element is any polynucleotide sequence that is identical or substantially identical to a portion of that genetic element. A xe2x80x9cgenetic elementxe2x80x9d includes a segment of DNA involved in producing a polypeptide chain and/or RNA chain. It can include regions preceding (e.g., leader) and following (e.g., trailer) the coding region in addition to intervening sequences (e.g., introns) between individual coding segments (e.g., exons). Genetic elements can include individual exons, introns, promoters, enhancers, genes, gene clusters, gene families, operons, and the like. An xe2x80x9cengineered genetic elementxe2x80x9d is a designed or otherwise artificially constructed genetic element.
An xe2x80x9cenhancer-linked split gene sequencexe2x80x9d is a subsequence of a genetic element that is linked to an enhancer. An xe2x80x9cenhancerxe2x80x9d is a cis-acting regulatory nucleotide sequence involved in the transcriptional activation of certain genetic elements. Activation of an enhancer can elevate the rate of transcription. Studies have shown that enhancers can operate when located either 5xe2x80x2 or 3xe2x80x2 to the transcriptional start site or promoter. They have also been shown to function at distances greater than three kilobases from the start site. Enhancers generally operate as binding sites for transcriptional activating proteins and are tissue specific. They can be incorporated into various expression vectors to optimize the expression of a chosen DNA sequence.
A xe2x80x9ctrans-acting transcription factorxe2x80x9d is a regulatory protein that controls transcription by binding to a specific enhancer, e.g., an enhancer that is linked to an enhancer-linked split gene sequence. The DNA sequence that encodes the transcription factor is not linked to the enhancer sequence upon which that transcription factor acts.
The term xe2x80x9ctrans-splicingxe2x80x9d includes the joining of at least two distinct RNA molecules or of at least two distinct polypeptide molecules to produce at least one trait encrypted RNA or at least one trait encrypted polypeptide, respectively.
A xe2x80x9cfull-length proteinxe2x80x9d is a protein with substantially the same sequence domains as a corresponding protein encoded by a natural gene. Such a protein can have altered sequences relative to the corresponding naturally encoded gene, e.g., due to recombination and selection, but unless specified to the contrary, is typically at least about 95% the length of a corresponding naturally encoded protein. The protein can include additional sequences such as purification tags not found in the corresponding naturally encoded protein.
A xe2x80x9ctoxic genetic elementxe2x80x9d includes a segment of DNA that encodes a polypeptide, that upon expression, produces sterility in certain organisms, e.g., male sterility in plants. A xe2x80x9ctoxic polypeptidexe2x80x9d is a polypeptide encoded by a toxic genetic element.
The term xe2x80x9cnon-overlapping gene sequencesxe2x80x9d refers to polynucleotide sequences that can be homologous to subsequences of a genetic element, but which do not share sequence identity or complementarity amongst themselves. A xe2x80x9cgap nucleic acidxe2x80x9d is a nucleic acid sequence that includes regions that are identical or complementary to at least two non-overlapping gene sequences.