This invention relates to methods for gene expression, mapping genes, mutagenesis, methods for introducing DNA into a host chromosome and to transposons and transposases.
Transposons or transposable elements include a short piece of nucleic acid bounded by repeat sequences. Active transposons encode enzymes that facilitate the insertion of the nucleic acid into DNA sequences.
In vertebrates, the discovery of DNA-transposons, mobile elements that move via a DNA intermediate, is relatively recent (Radice, A. D., et al., 1994. Mol. Gen. Genet. 244, 606-612). Since then, inactive, highly mutated members of the Tc1/mariner as well as the hAT (hobo/Ac/Tam) superfamilies of eukaryotic transposons have been isolated from different fish species, Xenopus and human genomes (Oosumi et al., 1995. Nature 378, 873; Ivics et al. 1995. Mol. Gen. Genet. 247, 312-322; Koga et al., 1996. Nature 383, 30; Lam et al., 1996. J. Mol. Biol. 257, 359-366 and Lam, W. L., et al. Proc. Natl. Acad Sci. USA 93, 10870-10875).
These transposable elements transpose through a cut-and-paste mechanism; the element-encoded transposase catalyzes the excision of the transposon from its original location and promotes its reintegration elsewhere in the genome (Plasterk, 1996 Curr. Top. Microbiol. Immunol. 204, 125-143). Autonomous members of a transposon family can express an active transposase, the trans-acting factor for transposition, and thus are capable of transposing on their own. Nonautonomous elements have mutated transposase genes but may retain cis-acting DNA sequences. These cis-acting DNA sequences are also referred to as inverted terminal repeats. Some inverted repeat sequences include one or more direct repeat sequences. These sequences usually are embedded in the terminal inverted repeats (IRs) of the elements, which are required for mobilization in the presence of a complementary transposase from another element.
Not a single autonomous element has been isolated from vertebrates; all transposon-like sequences are defective, apparently as a result of a process called xe2x80x9cvertical inactivationxe2x80x9d (Lohe et al., 1995 Mol. Biol. Evol. 12, 62-72). According to one phylogenetic model (Hartl et al., 1997 Trends Genet. 13, 197-201), the ratio of nonautonomous to autonomous elements in eukaryotic genomes increases as a result of the trans-complementary nature of transposition. This process leads to a state where the ultimate disappearance of active, transposase-producing copies in a genome is inevitable. Consequently, DNA-transposons can be viewed as transitory components of genomes which, in order to avoid extinction, must find ways to establish themselves in a new host. Indeed, horizontal gene transmission between species is thought to be one of the important processes in the evolution of transposons (Lohe et al., 1995 supra and Kidwell, 1992. Curr. Opin. Genet Dev. 2, 868-873).
The natural process of horizontal gene transfer can be mimicked under laboratory conditions. In plants, transposable elements of the Ac/Ds and Spm families have been routinely introduced into heterologous species (Osborne and Baker, 1995 Curr. Opin. Cell Biol. 7, 406-413). In animals, however, a major obstacle to the transfer of an active transposon system from one species to another has been that of species-specificity of transposition due to the requirement for factors produced by the natural host. For this reason, attempts have been unsuccessful to use the P element transposon of Drosophila melanogaster for genetic transformation of non-drosophilid insects, zebrafish and mammalian cells (Gibbs et al., 1994 Mol. Mar. Biol. Biotech. 3, 317-326; Handler et al., 1993. Arch. Insect Biochem. Physiol. 22, 373-384; and Rio et al., 1988 J. Mol. Biol. 200, 411-415). In contrast to P elements, members of the Tc1/mariner superfamily of transposable elements may not be as demanding for species-specific factors for their transposition. These elements are widespread in nature, ranging from single-cellular organisms to humans (Plasterk, 1996, supra). In addition, recombinant Tc1 and mariner transposases expressed in E. coli are sufficient to catalyze transposition in vitro (Vos et al, 1996 Genes. Dev. 10, 755-761 and Lampe et al., 1996. EMBO J. 15, 5470-5479 and PCT International Publication No. WO 97/29202 to Plasterk et al.). Furthermore, gene vectors based on Minos, a Tc1-like element (TcE) endogenous to Drosophila hydei, were successfully used for germline transformation of the fly Ceratitis capitata (Loukeris et al., 1995 Science 270, 2002-2005).
Molecular phylogenetic analyses have shown that the majority of the fish TcEs can be classified into three major types: zebrafish-, salmonid- and Xenopus TXr-type elements, of which the salmonid subfamily is probably the youngest and thus most recently active (Ivics et al., 1996, Proc. Natl. Acad. Sci. USA 93, 5008-5013). In addition, examination of the phylogeny of salmonid TcEs and that of their host species provides important clues about the ability of this particular subfamily of elements to invade and establish permanent residences in naive genomes through horizontal transfer, even over relatively large evolutionary distances.
TcEs from teleost fish (Goodier and Davidson, 1994 J. Mol. Biol. 241, 26-34 and Izsvak et al., 1995. Mol. Gen. Genet. 247, 312-322), including Tdr1 in zebrafish (Izsvak et al., 1995, supra) and other closely related TcEs from nine additional fish species (Ivics et al., 1996. Proc. Natl. Acad. Sci. USA 93, 5008-5013) are by far the best characterized of all the DNA-transposons known in vertebrates. Fish elements, and other TcEs in general, are typified by a single gene encoding a transposase enzyme flanked by inverted repeat sequences. Unfortunately, all the fish elements isolated so far are inactive due to one or more mutations in the transposase genes.
Methods for introducing DNA into a cell are known. These include, but are not limited to, DNA condensing reagents such as calcium phosphate, polyethylene glycol, and the like), lipid-containing reagents, such as liposomes, multi-lamellar vesicles, and the like, and virus-mediated strategies. These methods all have their limitations. For example, there are size constraints associated with DNA condensing reagents and virus-mediated strategies. Further, the amount of nucleic acid that can be introduced into a cell is limited in virus strategies. Not all methods facilitate integration of the delivered nucleic acid into cellular nucleic acid and while DNA condensing methods and lipid-containing reagents are relatively easy to prepare, the incorporation of nucleic acid into viral vectors can be labor intensive. Moreover, virus-mediated strategies can be cell-type or tissue-type specific and the use of virus-mediated strategies can create immunologic problems when used in vivo.
There remains a need for new methods for introducing DNA into a cell, particularly methods that promote the efficient integration of nucleic acid fragments of varying sizes into the nucleic acid of a cell, particularly the integration of DNA into the genome of a cell.
We have developed a DNA-based transposon system for genome manipulation in vertebrates. Members of the Tc1/mariner superfamily of transposons are prevalent components of the genomes of teleost fish as well as a variety of other vertebrates. However, all the elements isolated from nature appear to be transpositionally inactive. Molecular phylogenetic data were used to identify a family of synthetic, salmonid-type Tc1-like transposases (SB) with their recognition sites that facilitate transposition. A consensus sequence of a putative transposase gene was first derived from inactive elements of the salmonid subfamily of elements from eight species of fish and then engineered by eliminating the mutations that rendered these elements inactive. A transposase was created in which functional domains were identified and tested for biochemical functions individually as well as in the context of a full-length transposase. The transposase binds to two binding-sites within the inverted repeats of salmonid elements, and appears to be substrate-specific, which could prevent cross-mobilization between closely related subfamilies of fish elements. SB transposases significantly enhance chromosomal integration of engineered transposons not only in fish, but also in mouse and in human cells. The requirements for specific motifs in the transposase plus specific sequences in the target transposon, along with activity in fish and mammalian cells alike, establishes SB transposase as the first active DNA-transposon system for germline transformation and insertional mutagenesis in vertebrates. In one aspect of this invention, the invention relates to a nucleic acid fragment comprising: a nucleic acid sequence positioned between at least two inverted repeats wherein the inverted repeats can bind to a SB protein and wherein the nucleic acid fragment is capable of integrating into DNA in a cell. In one embodiment nucleic acid fragment is part of a plasmid and preferably the nucleic acid sequence comprises at least a portion of an open reading frame and also preferably at least one expression control region of a gene. In one embodiment, the expression control region is selected from the group consisting of a promoter, an enhancer or a silencer. Preferably the nucleic acid sequence comprises a promoter operably linked to at least a portion of an open reading frame.
In one embodiment the cell is obtained from an animal such as an invertebrate or a vertebrate. Preferred invertebrates include crustacean or a mollusk including, but not limited to a shrimp, a scallop, a lobster, a clam or an oyster. Preferred vertebrate embodiments include fish, birds, and mammal such as those selected from the group consisting of mice, ungulates, sheep, swine, and humans. The DNA of the cell can be the cell genome or extrachromosomal DNA, including an episome or a plasmid.
In one embodiment of this aspect of the invention, at least one of the inverted repeats comprises SEQ ID NO:4 or SEQ ID NO: 5 and preferably the amino acid sequence of the SB protein has at least an 80% amino acid identity to SEQ ID NO: 1. Also preferably, at least one of the inverted repeats comprises at least one direct repeat, wherein the at least one direct repeat sequence comprises SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 8 or SEQ ID NO:9. A preferred direct repeat is SEQ ID NO:10. Also preferably the nucleic acid fragment includes a direct repeat that has at least an 80% nucleic acid sequence identity to SEQ ID NO: 10.
In another aspect of this invention, the invention relates to a gene transfer system to introduce DNA into the DNA of a cell comprising: a nucleic acid fragment comprising a nucleic acid sequence positioned between at least two inverted repeats wherein the inverted repeats can bind to an SB protein and wherein the nucleic acid fragment is capable of integrating into DNA of a cell; and a transposase or nucleic acid encoding a transposase, wherein the transposase is an SB protein with an amino acid sequence sharing at least an 80% identity to SEQ ID NO:1. In one embodiment, the SB protein comprises SEQ ID NO:1. Alternatively, the SB protein is encoded by DNA that can hybridize to SEQ ID NO:3 under stringent hybridization conditions. In one embodiment, the transposase is provided to the cell as a protein and in another the transposase is provided to the cell as nucleic acid. In one embodiment the nucleic acid is RNA and in another the nucleic acid is DNA. In yet another embodiment, the nucleic acid encoding the transposase is integrated into the genome of the cell. The nucleic acid fragment can be part of a plasmid or a recombinant viral vector. Preferably, the nucleic acid sequence comprises at least a portion of an open reading frame and also preferably, the nucleic acid sequence comprises at least a regulatory region of a gene. In one embodiment the regulatory region is a transcriptional regulatory region and the regulatory region is selected from the group consisting of a promoter, an enhancer, a silencer, a locus-control region, and a border element. In another embodiment, the nucleic acid sequence comprises a promoter operably linked to at least a portion of an open reading frame.
The cells used in this aspect of the invention can be obtained from a variety of sources including bacteria, fungi, plants and animals. In one embodiment, the cells are obtained from an animal; either a vertebrate or an invertebrate. Preferred invertebrate cells include crustaceans or a mollusks. Preferred vertebrates include fish, birds, and mammal such as rodents, ungulates, sheep, swine and humans.
The DNA of the cell receiving the nucleic acid fragment can be a part of the cell genome or extrachromosomal DNA. Preferably, the inverted repeats of the gene transfer system comprise SEQ ID NO:4 or SEQ ID NO:5. Also preferably the amino acid sequence of the SB protein has at least a 80% identity to SEQ ID NO:1 and preferably at least one of the inverted repeats comprises at least one direct repeat and wherein the at least one direct repeat sequence comprises SEQ ID NO:6, SEQ ID NO: 7, SEQ ID NO:8 or SEQ ID NO:9. In one embodiment, the direct repeat has a consensus sequence of SEQ ID NO:10. In a particularly preferred embodiment, the nucleic acid sequence is part of a library of recombinant sequences and the nucleic acid sequence is introduced into the cell using a method selected from the group consisting of: particle bombardment, electroporation, microinjection, combining the nucleic acid fragment with lipid-containing vesicles or DNA condensing reagents, and incorporating the nucleic acid fragment into a viral vector and contacting the viral vector with the cell.
In another aspect of this invention, the invention relates to nucleic acid encoding an SB protein, wherein the nucleic acid encodes a protein comprising SEQ ID NO:1 or a protein comprising an amino acid sequence with at least 80% identity to SEQ ID NO:1. The nucleic acid encoding the SB protein can be incorporated into a nucleic acid vector, such as a gene expression vector either as a viral vector or as a plasmid. The nucleic acid can be circular or linear. This invention also relates to cells expressing the SB protein.
In one embodiment the cells containing the SB protein cell are obtained from an animal, either a vertebrate or an invertebrate. Preferred vertebrates include fish, birds and mammals. The cells can be obtained from a variety of tissues including pluripotent and totipotent cells such as an oocyte, one or more cells of an embryo, or an egg. In one embodiment, the cell is part of a tissue or organ. In one embodiment, the nucleic acid encoding the SB protein is integrated in the genome of a cell.
The invention also relates to SB protein comprising the amino acid sequence of SEQ ID NO:1.
In addition, the invention relates to a method for producing a transgenic animal comprising the steps of: introducing a nucleic acid fragment and a transposase into a pluripotent or totipotent cell wherein the nucleic acid fragment comprises a nucleic acid sequence positioned between at least two inverted repeats, wherein the inverted repeats can bind to a SB protein and wherein the nucleic acid fragment is capable of integrating into DNA in a cell and wherein the transposase is an SB protein having an amino acid sequence identity of least 80% to SEQ ID NO:1; and growing the cell into an animal. Preferred pluripotent or totipotent cells include an oocyte, a cell of an embryo, an egg and a stem cell. In one embodiment, the introducing step comprises a method selected from the group consisting of: microinjection; combining the nucleic acid fragment with cationic lipid vesicles or DNA condensing reagents; and incorporating the nucleic acid fragment into a viral vector and contacting the viral vector with the cell as well as particle bombardment and electroporation. In another preferred embodiment the viral vector is selected from the group consisting of a retroviral vector, an adenovirus vector, a herpesvirus or an adeno-associated viral vector. Preferred animals used in this method include a mouse, a fish, an ungulate, a bird, or a sheep.
In yet another aspect of this invention, the invention relates to a method for introducing nucleic acid into DNA in a cell comprising the step of: introducing a nucleic acid fragment comprising a nucleic acid sequence positioned between at least two inverted repeats into a cell wherein the inverted repeats can bind to an SB protein and wherein the nucleic acid fragment is capable of integrating into DNA in a cell in the presence of an SB protein. In a preferred embodiment, the method further comprises introducing an SB protein into the cell. In one embodiment, the SB protein has an amino acid sequence comprising at least a 80% identity to SEQ ID NO:1. The SB protein can be introduced into the cell as protein or as nucleic acid, including RNA or DNA. The cell receiving the nucleic acid fragment can already include nucleic acid encoding an SB protein and already express the protein. In a one embodiment, the SB protein is integrated into the cell genome. The SB protein can be stably expressed in the cell or transiently expressed and nucleic acid encoding the SB protein can be under the control of an inducible promoter or under the control of a constitutive promoter. In one aspect of this method, the introducing step comprises a method for introducing nucleic acid into a cell selected from the group consisting of: microinjection; combining the nucleic acid fragment with cationic lipid vesicles or DNA condensing reagents; and incorporating the nucleic acid fragment into a viral vector and contacting the viral vector with the cell. Preferred viral vectors are selected from the group consisting of a retroviral vector, an adenovirus vector or an adeno-associated viral vector. In another aspect of this method, the method includes the step of introducing an SB protein or RNA encoding an SB protein into the cell. The cells used for this method can be pluripotent or a totipotent cell and this invention also relates to transgenic animals produced by this method. Where transgenic animals are produced, the nucleic acid sequence preferably encodes a protein and preferably a protein to be collected from the transgenic animal or a marker protein. The invention also relates to those cells of the transgenic animal expressing the protein encoded by the nucleic acid sequence.
The invention also relates to a SB protein. In one embodiment the protein has the following characteristics: an ability to catalyze the integration of nucleic acid into DNA of a cell; capable of binding to the inverted repeat sequence of SEQ ID NOS 4 or 5; and 80% amino acid sequence identity to SEQ ID NO:1. In another embodiment, the protein has the following characteristics: transposase activity; a molecular weight range of about 35 kD to about 40 kD on about a 10% SDS-polyacrylamide gel; and an NLS sequence, a DNA binding domain and a catalytic domain and wherein the protein has at least about five-fold improvement in the rate for introducing a nucleic acid fragment into the nucleic acid of a cell as compared to the level obtained by non-homologous recombination. Preferred methods for testing the rate of nucleic acid fragment incorporation is provided in the examples.
In yet another aspect, the invention relates to a method for mobilizing a nucleic acid sequence in a cell comprising the steps of: introducing the protein of this invention into a cell housing DNA containing the nucleic acid fragment of this invention, wherein the protein mobilizes the nucleic acid fragment from a first position within the DNA of a cell to a second position within the DNA of the cell. In one embodiment, the DNA of a cell is genomic DNA. In another, the first position within the DNA of a cell is extrachromosomal DNA and in yet another, the second position within the DNA of a cell is extrachromosomal DNA. In a preferred embodiment, the protein is introduced into the cell as RNA.
The invention also relates to a method for identifying a gene in a genome of a cell comprising the steps of: introducing a nucleic acid fragment and an SB protein into a cell, wherein the nucleic acid fragment comprises a nucleic acid sequence positioned between at least two inverted repeats into a cell wherein the inverted repeats can bind to the SB protein and wherein the nucleic acid fragment is capable of integrating into DNA in a cell in the presence of the SB protein; digesting the DNA of the cell with a restriction endonuclease capable of cleaving the nucleic acid sequence; identifying the inverted repeat sequences; sequencing the nucleic acid close to the inverted repeat sequences to obtain DNA sequence from an open reading frame; and comparing the DNA sequence with sequence information in a computer database. In one embodiment, the restriction endonuclease recognizes a 6-base recognition sequence. In another embodiment, the digesting step further comprises cloning the digested fragments or PCR amplifying the digested fragments.
The invention also relates to a stable transgenic vertebrate line comprising a gene operably linked to a promoter, wherein the gene and promoter are flanked by inverted repeats, wherein the inverted repeats can bind to an SB protein. In one embodiment, the SB protein comprises SEQ ID NO:1 or an amino acid sequence with at least 80% homology to SEQ ID NO:1. In one embodiment, the vertebrate is a fish, including a zebrafish and in another the vertebrate is a mouse.
In addition, the invention also relates to a protein with transposase activity that can bind to one or more of the following sequences: SEQ ID NO: 4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, and SEQ ID NO:10.