In the past few years, a variety of gene trap vectors have been shown as being useful tools for the identification and analysis of permanently or transiently expressed genes. Standard gene trap vectors are DNA or retroviral vectors that insert a promoterless reporter gene into a large number of chromosomal sites. A classic gene trap vector integrates into introns, which are the non-expressed regions of a gene. Introns are flanked by exons, which are the expressed regions of a gene. Transcription of a trapped mammalian gene yields a primary messenger RNA consisting of exon, intron and vector sequences. Primary mRNA processing removes the intron sequences and splices the exons together at specific sites (splice sites) located at the 5′ and 3′ ends of each exon. As a result, the gene trap vector sequences encoding for the reporter gene become associated with the upstream exons in a processed fusion transcript from which a truncated cellular protein is translated together with the reporter protein.
With the completion of sequencing of the human and mouse genomes, the interest in tools suitable for performing genome-wide mutagenesis has significantly increased. Large scale insertional mutagenesis in mammalian cells has been most effectively induced with conventional gene trap vectors (Hansen, J. et al., Proc. Natl. Acad. Sci. USA 100:9918-22 (2003); Skarnes, W. C. et al., Nat. Genet. 36:543-4 (2004); Wiles, M. V. et al., Nat. Genet. 24:13-4 (2000); Zambrowicz, B. P. et al., Proc. Natl. Acad. Sci. USA 100:14109-14 (2003)). When selecting genes by means of their expression, recombinants will be obtained in which the reporter gene is fused to the regulatory elements of an endogenous gene. Transcripts generated by these gene fusion faithfully reflect the activity of individual cellular genes and serve as molecular tags to identify and/or clone any genes linked to specific functions. Thus, gene trap vectors simultaneously mutate and report on the expression of an endogenous gene at the site of insertion and provide a DNA tag for a rapid identification of the disrupted gene. The application of this technique in a genome-wide manner should allow for the identification of most, if not all, active transcripts in a genome and is thus an important tool for genome annotation. More importantly, gene trapping in mouse embryonic stem (ES) cells enables the establishment of ES cell libraries with mutations in a substantial fraction of genes in the mouse genome, which can be used to produce transgenic mice24. Thus, the gene trapping methodology enables the analysis of gene function in the context of an entire organism.
For some years targeted mutagenesis in pluripotent mouse embryonic stem (ES) cells has been used to inactivate genes for which cloned sequences were available (Capecchi, M. R., Trends Genet. 5:70-6 (1989)). Since ES cells can pass mutations induced in vitro to transgenic offspring in vivo, it is possible to analyze the consequences of gene disruptions in the context of entire organisms. As a result, numerous mouse strains with functionally inactivated genes (“knock out mice”) have been created by this technology. However, targeted mutagenesis requires detailed knowledge of gene structure and organization as well as its physical isolation in a cloning vector. Overall, the generation of mutant mouse strains by this procedure is still time consuming, labor intensive, expensive and inefficient because it can handle only one gene at the time.
The principal element of a standard gene trap vector is a gene disruption and selection cassette (GDSC) consisting of a promoterless reporter gene and/or selectable marker gene flanked by an upstream 3′ splice site (splice acceptor; SA) and a downstream transcriptional termination sequence (polyadenylation sequence; polyA; see FIG. 1). The GDSC is inserted into an intron of a target gene and transcription takes place from the upstream target gene promoter. Since the 3′ end of the exon upstream of the vector insertion is flanked by a splice donor (SD) site, it is spliced to the GDSC resulting in a fusion transcript in which the upstream exons of the trapped gene are fused in frame to the reporter and/or selectable marker gene. Due to the presence of a polyA sequence in the GDSC, transcription is terminated prematurely, and, as a result, any exon(s) downstream of the GDSC are not transcribed anymore. Consequently, the processed fusion transcript encodes a truncated form of the target gene, consisting of the upstream exon(s), and the reporter/selectable marker gene.
From the above it becomes apparent that standard gene trap vectors can only disrupt genes that are actively transcribed in the target cell. Genes that are not expressed or expressed only too weakly for detection, i.e. at low expression levels, cannot be recovered by standard gene trapping. This poses a significant problem for genome-wide mutagenesis programs seeking a large scale and cost-effective functional analysis of the ˜30,000 mammalian genes. In mouse embryonic stem (ES) cells, for example, only about one half of all genes are expressed, leaving ˜15,000 genes inaccessible to standard gene trapping. The overall impact of a gene trap resource for elucidating gene function in vivo will thus rest on the fraction of the genome that is accessible with the standard gene trapping technology.
In order to trap genes that are not accessible to standard trapping, gene trap vectors that can be activated independently of gene expression have been developed previously. These vectors are based on a selectable marker gene flanked upstream by a constitutive promoter and downstream by a 5′ splice site (splice donor, SD) (Zambrowicz, B. P. et al., Nature 392:608-11 (1998)). These elements are inserted downstream of a standard GDSC such as described above.
An insertion of these standard vectors into an intron of a gene induces splicing of the selectable marker gene, which, in turn, becomes associated with the downstream exon(s) of that gene. As a result, the cells express a fusion transcript initiating at the constitutive promoter and terminating at the polyA site of the trapped gene (=polyA trap). Since the selectable marker gene is expressed independently of the trapped gene's expression, poly-A traps should, at least in principle, enable the recovery of mutations in any gene.
However, there are some major drawbacks with these gene trap vectors and gene trapping methods. Several large scale screening efforts in ES cells with this technology have shown that polyA-containing gene trap vectors generate a high number of false positive recombinants and, more importantly, are not considered to be highly mutagenic (Zambrowicz, B. P. et al., Proc. Natl. Acad. Sci. USA 100: 14109-14 (2003)). So far two main reasons have been cited for their poor performance: (i) the vectors frequently acquire cryptic polyA sites on the non-coding strands of genes, and (ii) selection is biased for gene trap insertions close to the 3′ ends of genes, which are frequently non-mutagenic.
From the above it follows that there exists a need for gene trap vectors and gene trapping methods that overcome the above drawbacks, and which are efficient in the identification and mutation of cellular genes that are either not expressed or expressed too weakly to be detected by standard detection methodology. Thus, the provision of a gene trap strategy making most, if not all, genes of a genome accessible to effective trapping in a target cell would be highly desirable.
The problem underlying the present invention can thus be regarded as the provision of a gene trap vector and a gene targeting cassette that allows for the identification of gene products that are normally not expressed or expressed at non-detectable expression levels in a mammalian target cells. The solution provided by the present invention thus concerns a gene trap vector (eGTV) as defined in independent claim 1.