Technology for expressing recombinant proteins in both prokaryotic and eukaryotic organisms is well established. Mammalian cells offer significant advantages over bacteria or yeast for protein production, resulting from their ability to correctly assemble, glycosylate and post-translationally modify recombinantly expressed proteins. After transfection into the host cells, recombinant expression constructs can be maintained as extrachromosomal elements, or may be integrated into the host cell genome. Generation of stably transfected mammalian cell lines usually involves the latter; a DNA construct encoding a gene of interest along with a drug resistance gene (dominant selectable marker) is introduced into the host cell, and subsequent growth in the presence of the drug allows for the selection of cells that have successfully integrated the exogenous DNA. In many instances, the gene of interest is linked to a drug resistant selectable marker which can later be subjected to gene amplification. The gene encoding dihydrofolate reductase (DHFR) is most commonly used for this purpose. Growth of cells in the presence of methotrexate, a competitive inhibitor of DHFR, leads to increased DHFR production by means of amplification of the DHFR gene. As flanking regions of DNA will also become amplified, the resultant coamplification of a DHFR linked gene in the transfected cell line can lead to increased protein production, thereby resulting in high level expression of the gene of interest.
While this approach has proven successful, there are a number of problems with the system because of the random nature of the integration event. These problems exist because expression levels are greatly influenced by the effects of the local genetic environment at the gene locus, a phenomena well documented in the literature and generally referred to as “position effects” (for example, see Al-Shawi et al, Mol. Cell. Biol., 10:1192-1198 (1990); Yoshimura et al, Mol. Cell. Biol., 7:1296-1299 (1987)). As the vast majority of mammalian DNA is in a transcriptionally inactive state, random integration methods offer no control over the transcriptional fate of the integrated DNA. Consequently, wide variations in the expression level of integrated genes can occur, depending on the site of integration. For example, integration of exogenous DNA into inactive, or transcriptionally “silent” regions of the genome will result in little or no expression. By contrast integration into a transcriptionally active site may result in high expression.
Therefore, when the goal of the work is to obtain a high level of gene expression, as is typically the desired outcome of genetic engineering methods, it is generally necessary to screen large numbers of transfectants to find such a high producing clone. Additionally, random integration of exogenous DNA into the genome can in some instances disrupt important cellular genes, resulting in an altered phenotype. These factors can make the generation of high expressing stable mammalian cell lines a complicated and laborious process.
Recently, the use of DNA vectors containing translationally impaired dominant selectable markers in mammalian gene expression has been described. (This is disclosed in co-owned U.S. Ser. No. 08/147,696 filed Nov. 3, 1993, now U.S. Pat. No. 5,736,137).
These vectors contain a translationally impaired neomycin phosphotransferase (neo) gene as the dominant selectable marker, artificially engineered to contain an intron into which a DHFR gene along with a gene or genes of interest is inserted. Use of these vectors as expression constructs has been found to significantly reduce the total number of drug resistant colonies produced, thereby facilitating the screening procedure in relation to conventional mammalian expression vectors. Furthermore, a significant percentage of the clones obtained using this system are high expressing clones. These results are apparently attributable to the modifications made to the neo selectable marker. Due to the translational impairment of the neo gene, transfected cells will not produce enough neo protein to survive drug selection, thereby decreasing the overall number of drug resistant colonies. Additionally, a higher percentage of the surviving clones will contain the expression vector integrated into sites in the genome where basal transcription levels are high, resulting in overproduction of neo, thereby allowing the cells to overcome the impairment of the neo gene. Concomitantly, the genes of interest linked to neo will be subject to similar elevated levels of transcription. This same advantage is also true as a result of the artificial intron created within neo; survival is dependent on the synthesis of a functional neo gene, which is in turn dependent on correct and efficient splicing of the neo introns. Moreover, these criteria are more likely to be met if the vector DNA has integrated into a region which is already highly transcriptionally active.
Following integration of the vector into a transcriptionally active region, gene amplification is performed by selection for the DHFR gene. Using this system, it has been possible to obtain clones selected using low levels of methotrexate (50 nM), containing few (<10) copies of the vector which secrete high levels of protein (>55 pg/cell/day). Furthermore, this can be achieved in a relatively short period of time. However, the success in amplification is variable. Some transcriptionally active sites cannot be amplified and therefore the frequency and extent of amplification from a particular site is not predictable.
Overall, the use of these translationally impaired vectors represents a significant improvement over other methods of random integration. However, as discussed, the problem of lack of control over the integration site remains a significant concern.
One approach to overcome the problems of random integration is by means of gene targeting, whereby the exogenous DNA is directed to a specific locus within the host genome. The exogenous DNA is inserted by means of homologous recombination occurring between sequences of DNA in the expression vector and the corresponding homologous sequence in the genome. However, while this type of recombination occurs at a high frequency naturally in yeast and other fungal organisms, in higher eukaryotic organisms it is an extremely rare event. In mammalian cells, the frequency of homologous versus non-homologous (random integration) recombination is reported to range from {fraction (1/100)} to {fraction (1/5000)} (for example, see Capecchi, Science, 244:1288-1292 (1989); Morrow and Kucherlapati, Curr. Op. Biotech., 4:577-582 (1993)).
One of the earliest reports describing homologous recombination in mammalian cells comprised an artificial system created in mouse fibroblasts (Thomas et al, Cell, 44:419-428 (1986)). A cell line containing a mutated, non-functional version of the neo gene integrated into the host genome was created, and subsequently targeted with a second non-functional copy of neo containing a different mutation. Reconstruction of a functional neo gene could occur only by gene targeting. Homologous recombinants were identified by selecting for G418 resistant cells, and confirmed by analysis of genomic DNA isolated from the resistant clones.
Recently, the use of homologous recombination to replace the heavy and light immunoglobulin genes at endogenous loci in antibody secreting cells has been reported. (U.S. Pat. No. 5,202,238, Fell et al, (1993).) However, this particular approach is not widely applicable, because it is limited to the production of immunoglobulins in cells which endogenously express immunoglobulins, e.g., B cells and myeloma cells. Also, expression is limited to single copy gene levels because co-amplification after homologous recombination is not included. The method is further complicated by the fact that two separate integration events are required to produce a functional immunoglobulin: one for the light chain gene followed by one for the heavy chain gene.
An additional example of this type of system has been reported in NS/0 cells, where recombinant immunoglobulins are expressed by homologous recombination into the immunoglobulin gamma 2A locus (Hollis et al, international patent application # PCT/IB95 (00014).) Expression levels obtained from this site were extremely high—on the order of 20 pg/cell/day from a single copy integrant. However, as in the above example, expression is limited to this level because an amplifiable gene is not contegrated in this system. Also, other researchers have reported aberrant glycosylation of recombinant proteins expressed in NS/0 cells (for example, see Flesher et al, Biotech. and Bioeng., 48:399-407 (1995)), thereby limiting the applicability of this approach.
The cre-loxP recombination system from bacteriophage P1 has recently been adapted and used as a means of gene targeting in eukaryotic cells. Specifically, the site specific integration of exogenous DNA into the Chinese hamster ovary (CHO) cell genome using cre recombinase and a series of lox containing vectors have been described. (Fukushige and Sauer, Proc. Natl. Acad. Sci. USA, 89:7905-7909 (1992).) This system is attractive in that it provides for reproducible expression at the same chromosomal location. However, no effort was made to identify a chromosomal site from which gene expression is optimal, and as in the above example, expression is limited to single copy levels in this system. Also, it is complicated by the fact that one needs to provide for expression of a functional recombinase enzyme in the mammalian cell.
The use of homologous recombination between an introduced DNA sequence and its endogenous chromosomal locus has also been reported to provide a useful means of genetic manipulation in mammalian cells, as well as in yeast cells. (See e.g., Bradley et al, Meth. Enzymol., 223:855-879 (1993); Capecchi, Science, 244:1288-1292 (1989); Rothstein et al, Meth. Enzymol., 194:281-301 (1991)). To date, most mammalian gene targeting studies have been directed toward gene disruption (“knockout”) or site-specific mutagenesis of selected target gene loci in mouse embryonic stem (ES) cells. The creation of these “knockout” mouse models has enabled scientists to examine specific structure-function issues and examine the biological importance of a myriad of mouse genes. This field of research also has important implications in terms of potential gene therapy applications.
Also, vectors have recently been reported by Cell-tech (Kent, U.K.) which purportedly are targeted to transcriptionally active sites in NSO cells, which do not require gene amplification (Peakman et al, Hum. Antibod. Hybridomas, 5:65-74 (1994)). However, levels of immunoglobulin secretion in these unamplified cells have not been reported to exceed 20 pg/cell/day, while in amplified CHO cells, levels as high as 100 pg/cell/day can be obtained (Id.).
It would be highly desirable to develop a gene targeting system which reproducibly provided for the integration of exogenous DNA into a predetermined site in the genome known to be transcriptionally active. Also, it would be desirable if such a gene targeting system would further facilitate co-amplification of the inserted DNA after integration. The design of such a system would allow for the reproducible and high level expression of any cloned gene of interest in a mammalian cell, and undoubtedly would be of significant interest to many researchers.
In this application, we provide a novel mammalian expression system, based on homologous recombination occurring between two artificial substrates contained in two different vectors. Specifically, this system uses a combination of two novel mammalian expression vectors, referred to as a “marking” vector and a “targeting” vector.
Essentially, the marking vector enables the identification and marking of a site in the mammalian genome which is transcriptionally active, i.e., a site at which gene expression levels are high. This site can be regarded as a “hot spot” in the genome. After integration of the marking vector, the subject expression system enables another DNA to be integrated at this site, i.e., the targeting vector, by means of homologous recombination occurring between DNA sequences common to both vectors. This system affords significant advantages over other homologous recombination systems.
Unlike most other homologous systems employed in mammalian cells, this system exhibits no background. Therefore, cells which have only undergone random integration of the vector do not survive the selection. Thus, any gene of interest cloned into the targeting plasmid is expressed at high levels from the marked hot spot. Accordingly, the subject method of gene expression substantially or completely eliminates the problems inherent to systems of random integration, discussed in detail above. Moreover, this system provides reproducible and high level expression of any recombinant protein at the same transcriptionally active site in the mammalian genome. In addition, gene amplification may be effected at this particular transcriptionally active site by including an amplifiable dominant selectable marker (e.g. DHFR) as part of the marking vector.