Transposons
Transposons or transposable elements include a short piece of nucleic acid bounded by inverted repeat sequences. Active transposons encode enzymes that facilitate the insertion of the nucleic acid into DNA sequences.
In vertebrates, the discovery of DNA-transposons, mobile elements that move via a DNA intermediate, is relatively recent (Radice, A. D., et al., 1994. Mol. Gen. Genet. 244, 606–612). Before then, only inactive, highly mutated members of the Tc1/mariner as well as the hAT (hobo/Ac/Tam) superfamilies of eukaryotic transposons had been isolated from different fish species, Xenopus and human genomes (Oosumi et al., 1995. Nature 378, 873; Ivics et al. 1995. Mol. Gen. Genet. 247, 312–322; Koga et al., 1996. Nature 383, 30; Lam et al., 1996. J. Mol. Biol. 257, 359–366 and Lam, W. L., et al. Proc. Natl. Acad. Sci. USA 93, 10870–10875).
DNA transposable elements transpose through a cut-and-paste mechanism; the element-encoded transposase catalyzes the excision of the transposon from its original location and promotes its reintegration elsewhere in the genome (Plasterk, 1996 Curr. Top. Microbiol. Immunol. 204, 125–143). Autonomous members of a transposon family can express an active transposase, the trans-acting factor for transposition, and thus are capable of transposing on their own. Nonautonomous elements have mutated transposase genes but may retain cis-acting DNA sequences. These cis-acting DNA sequences are also referred to as inverted terminal repeats. Some inverted repeat sequences include one or more direct repeat sequences. These sequences usually are embedded in the terminal inverted repeats (IRs) of the elements, which are required for mobilization in the presence of a complementary transposase from another element or from itself.
Not a single autonomous transposable element has been isolated from vertebrates; all transposon-like sequences isolated to date are defective, apparently as a result of a process called “vertical inactivation” (Lohe et al., 1995 Mol. Biol. Evol. 12, 62–72). According to one phylogenetic model (Hartl et al., 1997 Trends Genet. 13, 197–201), the ratio of nonautonomous to autonomous elements in eukaryotic genomes increases as a result of the trans-complementary nature of transposition. This process leads to a state where the ultimate disappearance of active, transposase-producing copies in a genome can be inevitable. Consequently, DNA-transposons can be viewed as transitory components of genomes which, in order to avoid extinction, must find ways to establish themselves in a new host. Indeed, horizontal gene transmission between species is thought to be one of the important processes in the evolution of transposons (Lohe et al., 1995 Mol. Biol. Evol. 12, 62–72 and Kidwell, 1992. Curr. Opin. Genet. Dev. 2, 868–873).
The natural process of horizontal gene transfer can be mimicked under laboratory conditions. In plants, transposable elements of the Ac/Ds and Spm families have been routinely introduced into heterologous species (Osborne and Baker, 1995 Curr. Opin. Cell Biol. 7, 406–413). In animals, however, a major obstacle to the transfer of an active transposon system from one species to another has been that of apparent species-specificity of transposition due to the requirement for factors produced by the natural host. For this reason, attempts have been unsuccessful to use the P element transposon of Drosophila melanogaster for genetic transformation of non-drosophilid insects, zebrafish and mammalian cells (Gibbs et al., 1994 Mol. Mar. Biol. Biotech. 3, 317–326; Handler et al., 1993. Arch. Insect Biochem. Physiol. 22, 373–384; and Rio et al., 1988 J. Mol. Biol. 200, 411–415). In contrast to P elements, members of the Tc1/mariner superfamily of transposable elements may not be as demanding for species-specific factors for their transposition. These elements are widespread in nature, ranging from single-cellular organisms to humans (Plasterk, 1996 Curr. Top. Microbiol. Immunol. 204, 125–143). In addition, recombinant Tc1 and mariner transposases expressed in E. coli are sufficient to catalyze transposition in vitro (Vos et al, 1996 Genes. Dev. 10, 755–761 and Lampe et al., 1996. EMBO J. 15, 5470–5479 and PCT International Publication No. WO 97/29202 to Plasterk et al.). Furthermore, gene vectors based on Minos, a Tc1-like element (TcE) endogenous to Drosophila hydei, were successfully used for germline transformation of the fly Ceratitis capitata (Loukeris et al., 1995 Science 270, 2002–2005).
Molecular phylogenetic analyses have shown that the majority of the fish TcEs can be classified into three major types: zebrafish-, salmonid- and Xenopus TXr-type elements, of which the salmonid subfamily is probably the youngest and thus most recently active (Ivics et al., 1996, Proc. Natl. Acad. Sci. USA 93, 5008–5013). In addition, examination of the phylogeny of salmonid TcEs and that of their host species provides important clues about the ability of this particular subfamily of elements to invade and establish permanent residences in naive genomes through horizontal transfer, even over relatively large evolutionary distances.
TcEs from teleost fish (Goodier and Davidson, 1994 J. Mol. Biol. 241, 26–34), including Tdr1 in zebrafish (Izsvak et al., 1995 Mol. Gen. Genet. 247, 312–322) and other closely related TcEs from nine additional fish species (Ivics et al., 1996. Proc. Natl. Acad. Sci. USA 93, 5008–5013) are by far the best characterized of all the DNA-transposons known in vertebrates. Fish elements, and other TcEs in general, are typified by a single defective gene encoding a transposase enzyme flanked by inverted repeat sequences. Unfortunately, all the fish elements isolated so far are inactive due to one or more mutations in the transposase genes.
Functional Genomics
There are estimated to be between 50,000 and 100,000 genes in the genome of vertebrates. The expression of these genes is carefully orchestrated such that most genes are not expressed most of the time in most tissues. The roles of most genes in vertebrate genomes are unknown. Yet, most diseases have a genetic basis. Accordingly, finding the sites and roles of expression of the genes in a vertebrate, especially human, genome is an important task. The task is exceedingly difficult.
Most studies to date in the field of genomics have concentrated on identifying in cells of various types the sequences of expressed mRNAs encoded by the coding sequence of a gene. However, this procedure does not often provide insights into the functions of the genes, nor their importance.
An alternative method of finding genes and their functions is to interrupt (mutate) genes with a molecular tag. Then, the interrupted genetic locus can be isolated based on the inserted genetic tag and the gene can be correlated with a phenotype, i.e., a physical result due to the loss of function of the interrupted gene. Genetic tags called gene-traps have been devised wherein a marker gene is inserted randomly into a genome (reviewed in Mountford, P. S., et al. Trends Genet., 11, 179–84 (1995)). When a critical gene is interrupted, and the marker gene is inserted in just the right way (in the correct direction, in-frame, and in an exon of the interrupted gene), the marker gene is expressed in the tissue in which the interrupted gene normally is expressed.
A variation of the gene trap is to employ a splice acceptor site followed by an internal ribosome entry site (IRES) placed in front of a marker gene. Splice acceptor sites provide signals to target the sequences following the splice acceptor site to be expressed as mRNA provided there is an intron upstream of the splice acceptor site (Padgett, T., et al., Ann. Rev. Biochem. J., 55, 1119–1150 (1988)). An IRES allows ribosomal access to mRNA without a requirement for cap recognition and subsequent scanning to the initiator AUG (Pelletier, J. A., et al., Nature, 334, 320–325 (1988)). This expands the probability that the marker gene will be expressed when inserted into a gene. With a construct containing a splice acceptor site followed by an IRES is placed in front of a marker gene, it is possible to get expression of the marker gene even if the construct integrates in an intron or if it integrates out of frame with respect to the interrupted gene. The splice acceptor increases the likelihood that the inserted sequences will be present in the resulting mRNA, and the IRES increases the likelihood of translation of the inserted sequences. This approach, known to the art as a “gene-trap,” requires that the molecular tag insert within the coding sequence where it will be expressed at approximately the same levels as the gene that is disrupted. However, the level of expression of the disrupted gene may be low and the “target-size” (the length of the coding sequence in base-pairs) may be small.
The encephalomycarditis virus (EMCV) IRES has been used for gene-trapping (von Melchner et al., J. Virol., 63, 3227–3233 (1989)), is well characterized (Jang, S. K., et al., Genes Dev 4, 1560–1572 (1990); Kaminski, A., et al., EMBO J 13, 1673–1681 (1994); Hellen, C. U., et al., Curr. Top. Microbiol. Immunol. 203, 31–63 (1995)) and has been shown to function efficiently in mammalian (Borman, A. M., et al., Nucleic Acids Res. 25, 925–32 (1997), Borman, A. M., et al., Nucleic Acids Res. 23, 3656–63 (1995)) and chicken cells (Ghattas, I. R., et al., Mol. Cell. Biol. 11, 5848–59 (1991)). The use of an IRES between the splice acceptor and reporter molecule has been shown to lead to as much as 10-fold greater numbers of G418-resistant colonies in mouse embryonic stem cells than a non-IRES vector (see Mountford P. S., et al. Trends Genet., 11, 179–84 (1995)). But this rate is still unacceptably low, which is why it is not used for mass screening of genes.
IRESs have been adapted into dicistronic vectors for the expression of two open reading frames. For instance, using an IRES in a dicistronic vector can result in more than 90% of transfected cells producing both the biological gene of interest and the selectable marker (Ghattas et al. Mol. Cell. Biol., 11, 5848–59 (1991)).
Another strategy results in the “trapping” of sequences 3′ of the inserted marker gene. This entails the use of a retrovirus to deliver a marker gene that is placed between a promoter and a splice donor site (Zambrowicz, B. P., et al., Nature, 392, 608–611 (1998)). Splice donor sites provide signals to target the RNA sequences encoding the marker gene to be spliced to the next downstream splice acceptor site. When the marker gene is expressed, and there is a downstream splice acceptor site, the mRNA may contain a poly(A) tail and therefore be more stable and more efficiently translated. This expands the probability that the marker gene will be expressed only when inserted into a gene.
An alternative strategy is to use an enhancer-trap (Weber, F., et al., Cell, 36, 983–992 (1984)). In this strategy, the marker gene is placed behind a weak promoter to give a minimal promoter-marker gene construct. The minimal promoter by itself does not have the ability to direct high expression of the marker gene. However, when the minimal promoter is located in the vicinity of certain regulatory sequences called enhancers, it can direct the expression of the marker gene at levels and in tissues in which the enhancers are active. Thus, the enhancer-trap tag does not have to insert only within a coding sequence; it can be activated by insertion outside of the transcription unit. An enhancer-trap may direct higher levels of expression than a gene-trap vector, which may increase the ability of a researcher to detect the insertion of the molecular tag.
Many methods for introducing DNA into a cell in order to perform various types of mutational analysis such as described above are known. These include, but are not limited to, DNA condensing reagents such as calcium phosphate, polyethylene glycol, and the like, lipid-containing reagents, such as liposomes, multi-lamellar vesicles, and the like, virus-mediated strategies, ballistic methods and microinjection and the like. These methods all have their limitations. For example, there are size constraints associated with DNA condensing reagents and virus-mediated strategies. Further, the amount of nucleic acid that can be introduced into a cell is limited in virus strategies. Not all methods facilitate integration of the delivered nucleic acid into cellular nucleic acid and while DNA condensing methods and lipid-containing reagents are relatively easy to prepare, the incorporation of nucleic acid into viral vectors can be labor intensive. Moreover, virus-mediated strategies can be cell-type or tissue-type specific and the use of virus-mediated strategies can create immunologic problems when used in vivo. Most non-viral mediated methods often result in concatamerization of input DNA as well as random break points within the delivered DNA. Consequently, currently available vectors are limited in the ability to insert either gene-traps or enhancer-traps into genomes at high rates for high throughput screening for mutations and associated identification of tissues in which the marker gene is expressed. Thus, there remains a need for new methods for introducing into a cell constructs that contain molecular tags that can provide information regarding sites and roles of expression of genes.