The ongoing genomic sequencing project on a number of organisms has resulted in an enormous amount of sequence data being deposited in public databases (Schuler, et al., Science 274:540-546 (1996)). Analyzing these data using a variety of bioinformatics tools can result in assigning function or protein identification to a number of these genes. However, true biological function cannot be determined without biological data. In animals and in plants the most successful strategy has been to knock out gene function either randomly through saturation mutagenesis or the use of antisense technology to study phenotype one gene at a time. In these functional screens, mutagenic agents are used to produce a large number of organisms that are analyzed for the specific phenotype or metabolic profile. Matching phenotype with genetic lesion has identified many genes involved in development and metabolism. This approach has been carried out successfully in the fruit fly Drosophila melanogaster (Nusslein-Volhard et al., Nature 287:795-801 (1980)), the nematode C. elegans (Brenner, Genetics 77:71-94 (1974)), and in Arabidopsis thaliana (Mayer, et al., Nature 353:402-407 (1991)).
In the mouse, gene trapping has provided a powerful approach to recover and identify novel phenotypes (Brown, J Inherit Metab Dis 21:532-539 (1998)). Ideally, in the process of gene discovery, no assumption should be made about which genes or pathways should be disrupted or examined. This approach, however, has not proven successful over time. With mice, however, the situation has changed dramatically with the advent of embryonic stem (ES) cell lines and the means to generate and select genetic alterations (Evans et al., Nature 292:154-156 (1981)). ES cells can be maintained in culture as totipotent cells, that is, cells that can give rise to all types of differentiated cells under proper growth conditions. These cells can also be genetically altered with relative ease (Thomas et al., Cell 51:503-512 (1987)). Like the ES cells from mice, plant cells from many plants are totipotent and can be used in similar studies.
Assigning gene function by observation of phenotype due to disruption of a gene in the transformed plant is not always straightforward. When there are multiple copies of a gene in a gene family, the phenotype might not be immediately evident. By determining the spatial and temporal expression of the disrupted gene, further evidence is gained for assigning gene function. This is especially valuable when a simple phenotype is not evident or when relating more complex phenotypes to functions and development of the whole organism. In some instances no obvious phenotype may be discerned but spatial and temporal expression of the reporter may provide critical information for defining the function of that genetic locus. The reporter gene is able to provide much higher resolution than gene chips or Northern analysis for tissue specific expression.
Including additional functions to the gene-trapping vector can provide novel tools for gene expression. With recombination sites incorporated into the vector it is possible to insert a gene of interest at this defined location. This may be done in a fashion to simply insert a gene of interest next to, or to replace the reporter gene, or to permit multiple/tandem insertions and replacements. Analysis of expression patterns in phenotypically normal plants will provide “landing sites” for inserting a gene of interest to obtain a highly specific and well-defined pattern of expression. As there are numerous drawbacks to the current random nature of gene insertion during plant transformation, this approach offers significant advantages.
Gene Trapping
Alternative strategies for identifying gene function were explored in the early 1990s. The approach of “gene trapping” was investigated to screen libraries of random mutants. The principal of gene trapping is essentially the random insertion of a DNA vector and the ensuing disruption of endogenous structural genes. Further improvements to the approach was to include a reporter gene that could readily signal the presence of the vector DNA. The reporter gene mimics the expression of the endogenous gene while mutating the same locus (Evans et al., Trends Genet. 13:370-374 (1997)). Large libraries of clones with random integrations can be isolated and stored indefinitely for future analysis. By using PCR (polymerase chain reaction) the sequence of the “trapped” gene can be identified. This technique allows the identification of genes regardless of their level of expression in vivo (Frohman et al., Proc. Natl. Acad. Sci. USA 85:8998-9002 (1988)). The ability to mutate, identify phenotype, and analyze expression of a specific gene makes gene trapping a very attractive tool for functional genomics. Gene trapping has been used for disruption and identification of genes in mouse ES cells (Skarnes et al., Genes Dev. 6:903-918 (1992)), Zambrowicz, et al., Nature 392:608-611 (1998)), genes including those membrane and secreted proteins (Skarnes et al., Proc. Natl. Acad. Sci. USA 92:6592-6596 (1995)), genes activated in differentiated mouse ES cells (Salminen et al., Dev. Dyn. 212:326-333 (1998)), genes to respond to retinoic acid (Forrester et al., Proc. Natl. Acad. Sci. USA 93:1677-1682 (1996)), and genes that are important in the development of the mammalian nervous system (Stoykova et al., Dev. Dyn. 212:198-213 (1998)).
Design of Gene Trap Vectors
Trapping vectors fall into essentially two different categories. The “enhancer-trap” vectors must integrate near an enhancer that activates the reporter gene that is fused to a minimal promoter (Bellen et al., Genes Dev 3:1288-1300 (1989)). “Promoter trap” vectors have no 5′ expression element in front of the reporter. Gene-trap vectors may contain a splice acceptor (SA) at the 5′ end of the reporter gene resulting in the generation of fusion transcripts following integration into the intron of an actively transcribed gene (Skarnes et al., Genes Dev. 6:903-918 (1992), Forrester et al., Proc. Natl. Acad. Sci. USA 93:1677-1682 (1996), Brenner et al., Proc. Natl. Acad. Sci. USA 86:5517-5521 (1989), von Melchner et al., Genes Dev. 6:919-927 (1992), Wurst et al., Genetics 139:889-899 (1995)). For functional genomics a gene-trap vector must provide three minimal functions. It must have a suitable reporter gene for the analysis of gene expression, the “trap event” must mutate the endogenous gene, and the sequence of the trapped cDNA and genomic site of integration must be able to be determined. For use as a landing site, the gene-trap vector must have a suitable reporter that can be measured in all cell types and all stages of development, the insertion of the gene trap must not result in impairment of the plant, and the recombination system must still be functional following integration. Landing pads may also be used for functional genomics. In this respect, the landing pad sites are used to test the effects of the expression of a novel gene whether or not that gene comes from the same source or a heterologous source. The function of an encoded gene product can be determined from the effect of ectopic expression of the gene.
In mouse ES cells, the DNA can be introduced by electroporation or by retroviral vectors that provide higher transfection frequency and integrate as intact a single copy. Likewise in plants, electroporation or particle bombardment can be used while Agrobacterium transformation can be used to introduce low or single copy genes.
The earliest vectors were used in undifferentiated ES cells (Skarnes et al., Genes Dev. 6:903-918 (1992), Friedrich et al., Genes Dev. 5:1513-1523 (1991)). The first gene-trap vectors contained an SA site in front of a promoterless reporter gene such as lacZ (which encodes the enzyme beta-galactosidase; Skarnes et al., Genes Dev. 6:903-918 (1992)) or beta-geo (which is formed from the beta-galactosidase gene (beta-gal) and the neomycin-resistance gene (neo) and encodes a fusion protein (Friedrich et al., Genes Dev. 5:1513-1523 (1991))). The integration of the vector into the intron of an expressed gene in the correct orientation results in a fusion messenger RNA (mRNA) transcript. Subsequently an internal ribosome entry site (IRES) from the encephalomyocarditis virus was inserted between the SA site and reporter gene sequence (Chowdhury et al., Nucleic Acids Res. 25:1531-1536 (1997)). The IRES allows di-cistronic translation so the reporter gene can be translated independent of being fused in-frame to the trapped gene. With this vector it is important to realize that the level of expression of the reporter gene is dependent on the rate of transcription from the trapped gene.
The next generation vectors did not incorporate a poly-A site to direct the addition of a poly-A tail at the end of the introduced marker gene. The signal was provided by the endogenous gene to produce a stable mRNA (Zambrowicz et al., Nature 392:608-611 (1998), Salminen et al., Dev. Dyn. 212:326-333 (1998)). Rather than trapping at the promoter, these vectors incorporated a promoter but relied on trapping at the 3′ end. The advantage of this vector was that the 3′ end of the gene was sometimes more useful for gene identification.
Gene Traps in Plants T-DNA
Since T-DNA has not been shown to insert with any specificity, it is possible to saturate the genome with T-DNA insertions (Azpiroz-Leehan et al., Trends Genet. 13:152-156 (1997)). Large collections of T-DNA insertions have been generated in Arabidopsis (Feldmann et al., Mol. Gen. Genet. 208:1-9 (1987); Bouchez et al., Acad. Sci. Ser. III Sci. Vie 316:1188-1193 (1993); Campisi et al., Plant J. 17:699-707 (1999); Krysan et al., Plant Cell 11:2283-2290 (1999); Weigel et al., Plant Physiol. 122:1003-1014 (2000)) and systematic efforts have been ongoing to use these collections for “reverse genetic” screens (McKinney et al., Plant J. 8:613-622 (1995); Winkler et al., Plant Physiol. 118:743-750 (1998); Krysan et al., Plant Cell 11:2283-2290 (1999)). This approach is limited to those plant species that can be transformed by Agrobacterium. Although Agrobacterium generally delivers low or single copy gene insertions into the genome, multiple T-DNA insertions can often occur in a single plant (Bechtold et al., Acad. Sci. Ser. III Sci. Vie 316:1194-1199 (1993); Lindsey et al., Transgenic Res. 2:33-247 (1993)). Multiple enhancer or gene trap reporter gene insertions can complicate interpretation of expression patterns. The generation of complex insertions including T-DNA repeats (direct or inverted orientations) as well as rearrangements of adjacent chromosome DNA can also be problematic in interpreting gene expression patterns (Ohba et al., Plant J. 7:157-164 (1995); Nacry et al. Genetics 149:641-650 (1998); Laufs et al., Plant J. 18:131-139 (1999)). In addition to the complex gene expression patterns, the subsequent molecular analyses are also complicated making it difficult to isolate the genes of interest. Enhancer, promoter, and gene trap reporter genes have been used in plants by a number of different groups. The expression of the reporter gene has been efficient whether the reporter gene was positioned at either the left or the right T-DNA border (Lindsey et al., Transgenic Res. 2:33-247 (1993), Campisi et al., Plant J. 17:699-707 (1999)).
Transposable Elements
Insertional mutagenesis is routinely performed using transposable elements. Heterologous elements have been utilized in species that do not have active or well-characterized transposable elements systems (see Osborne et al., Genetics 129:833-844 (1991) for review). The elements in the system are introduced by T-DNA-mediated transformation and mobilization occurs subsequently. In the absence of a transposase the inserted transposable elements are stable. However, the transposable elements can be selectively de-stabilized upon expression of a transposase. The selective re-mobilization can lead to revertants, which can then be used to verify that the phenotype was indeed caused by insertion of the transposon.
Behavior of the maize Ac/Ds and En/Spm transposable elements has been extensively studied in heterologous species. They have also been modified for efficient transposition in tobacco, tomato, and Arabidopsis (see Osborne et al., Curr. Opin. Cell Biol. 7:406-413 (1995) for review). The Ac/Ds system has been used for enhancer or gene trap systems to date. The Ac/Ds system has the advantage of low copy number, which is an advantage over the En/Spm system, which has a tendency to amplify (Aarts et al., Mol. Gen. Genet. 247:555-564 (1995)). The maize Mu element is being exploited for functional genomic studies in maize. Plant retrotransposons also can be used in this invention. Retrotransposons are widely distributed among eukaryotes including plants (Langdon et al., Genetics 156:313-325 (2000)). Some of them, like tobacco Tnt1 (Grandbastien et al., Nature 337:376-380 (1989); Feuerbach et al., J. Virology 71:4005-4015 (1997)) and Tto1 (Hiroshika et al., , Gene 165:229-232 (1995); Takeda et al., Plant J. 28:307-317 (2001)) are well studied and can be used for engineering technology described in this invention.
IRES Elements in Plants
According to the ribosome-scanning model, traditional for most eukaryotic mRNAs, the 40S ribosomal subunit binds to the 5′-cap and moves along the nontranslated 5′-sequence until it reaches an AUG codon (Kozak, Adv. Virus Res. 31:229-292 (1986); Kozak, J. Mol. Biol. 108:229-241 (1989)). Although for the majority of eukaryotic mRNAs only the first open reading frame (ORF) is translationally active, there are different mechanisms by which mRNA may function polycistronically (Kozak, Adv. Virus Res. 31:229-292 (1986)).
In contrast to the majority of eukaryotic mRNAs, the initiation of translation of picornavirus RNAs takes place by an alternative mechanism of internal ribosome entry. A picornaviral 5′-nontranslated region (5′NTR) contains a so-called internal ribosome entry site (IRES) or ribosome landing pad (Pelletier et al., Nature 334:320-325 (1988); Molla et al., Nature 356:255-257 (1992)). Internal ribosome entry has also been reported for other viral (Le et al., Virology 198:405-411 (1994); Gramstat et al., Nucleic Acid Res. 22:3911-3917 (1994)) and cellular (Oh et al., Gen Dev. 6:1643-1653 (1992)) RNAs. It is important to emphasize that the picornavirus and other known IRESes are not active in the plant cell systems.
Recently a new tobamovirus, crTMV, has been isolated from Oleracia officinalis L. plants and the crTMV genome has been sequenced (6312 nucleotides) (Dorokhov et al., Doklady of Russian Academy of Sciences 332:518-522 (1993); Dorokhov et al., FEBS Lett. 350:5-8 (1994)). A peculiar feature of crTMV is its ability to infect systemically the members of Cruciferae family. The crTMV RNA contains four ORFs encoding the proteins of 122K (ORF1), 178K (ORF2), the read-through product of 122K, 30K MP (ORF3) and 17K CP (ORF4). Unlike other tobamoviruses, the coding regions of the MP and CP genes of crTMV overlap for 25 codons, i.e. 5′ of the CP coding region are sequences encoding MP.
It has been shown that unlike the RNA of typical tobamoviruses, translation of the 3′-proximal CP gene of crTMV RNA occurs in vitro and in planta by the mechanism of internal ribosome entry that is mediated by a specific sequence element, IREScP148 (Ivanov et al., Virology 232:32-43 (1997)). The results indicated that the 148-nt region upstream of the CP gene of crTMV RNA contained IRESCP148 promoting internal initiation of translation in vitro and in vivo (protoplasts and transgenic plants).
Recently it has been shown (Skulachev et al., Virology 263:139-154 (1999)) that the genomic RNAs of tobamoviruses contain a region upstream of the MP gene that are able to promote expression of the 3′-proximal genes from chimeric mRNAs in a cap-independent manner in vitro. The 228-nt sequence upstream from the MP gene of crTMV RNA (IRESMP228CR) mediates translation of the 3′-proximal GUS gene from bicistronic transcripts. It has been shown that the 75-nt region upstream of the MP gene of crTMV RNA is still as efficient as the 228-nt sequence. Therefore the 75-nt sequence contains an IRESMP element (IRESMP75CR). It has been found that in similarity to crTMV RNA, the 75-nt sequence upstream of genomic RNA of a type member of tobamovirus group (TMV UI) also contains IRESMP75UI element capable of mediating cap-independent translation of the 3′-proximal genes in RRL and WGE.
On the whole the data prove unambiguously that the 228-and 75-nt sequences upstream of MP gene derived from genomic RNAs of different tobamoviruses contain a new IRES element (IRESMP). Efficiency of IRESMP in internal translation was similar to that of IRESCP.
The tobamoviruses provide a new example of internal initiation of translation, which is markedly distinct from IRESes shown for picornaviruses and other viral and eukaryotic mRNAs.
In patent application (PCT/FI98/00457) it has been shown that tobamoviruses IRES elements provide an internal translational pathway of the 3′-proximal gene expression from bicistronic chimeric RNA transcripts in plant, animal, human and yeast cells. These RNA sequence elements situated upstream of movement protein (MP) and coat protein (CP) genes, are designated respectively as an internal ribosome entry site of MP (IRESMP) and CP (IRESCP) genes, respectively. Both IRESes can be employed to produce chimeric bi- or multicistronic mRNAs for co-expression of heterologous (or multiple homologous) genes in plant, animal, human and yeast cells, and also transgenic plants and animals. The efficient (more than 30% in comparison to monocistronic transcript) IRESMP- and IRESCP-mediated expression of the second (3′) foreign gene from bicistronic transcript was demonstrated in plants transgenic for bicistronic constructs, in transient expression assays (on electroporated protoplasts or in particle bombardment experiments) and in vitro in cell-free protein synthesizing systems of plant (wheat germ extracts) or animal (rabbit reticulocyte lysates) origin; in human (HeLa) cells transformed with bicistronic IRESMP-containing constructs and in yeast cells transformed with the said bicistronic constructs. The IRESMP element capable of mediating cap-independent translation is contained not only in crTMV RNA but also in the genome of a type member of tobamovirus group, TMV UI, and another tobamovirus, cucumber green mottle mosaic virus. Consequently, different members of tobamovirus group contain IRESMP.