Functional analyses of genes in vivo frequently involve the introduction of modified genomic DNA into the germline to generate transgenic animals [Jaenisch et al., Science 240:1468 (1985); Brinster, Cell 41:343 (1985)]. The genomic DNA sequences containing introns and essential regulatory sequences have been shown to be expressed in vivo in cases where simple cDNA constructs cannot be expressed [Brinster et al., Proc. Natl. Acad. Sci. 85:836-840 (1988)]. Furthermore, the size of the genomic DNA that can be readily manipulated in vitro and introduced into the germline can be a critical determinant of the outcome of the functional analysis of a gene since elements that are important for high level, tissue specific and position-independent expression of the transgene may be located at a long distance from the gene itself [Dillon et al., Trends Genet. 9:134 (1993); Kennison, Trends Genet. 9:75 (1993); Wilson et al., Annu. Rev. Cell. Biol. 6:679 (1990)].
On the other hand, the use of such large genomic transgenes has several practical problems. For example, the size of the transgene is presently limited due to constraints on the sequence length that can be cloned and stably maintained in a conventional plasmid or a cosmid. Thus DNA sequences suspected of being nonessential are often omitted when designing the constructs to be transferred because of the size limitation. In addition, in vitro manipulations of large DNAs oftentimes lead to mechanical shear [Peterson et al., TIG 13:61-66].
Yeast artificial chromosomes (YACs) allow large genomic DNA to be modified and used for generating transgenic animals [Burke et al., Science 236:806; Peterson et al., Trends Genet. 13:61 (1997); Choi, et al., Nat. Genet., 4:117-223 (1993), Davies, et al., Biotechnology 11:911-914 (1993), Matsuura, et al., Hum. Mol. Genet., 5:451-459 (1996), Peterson et al., Proc. Natl. Acad. Sci., 93:6605-6609 (1996); and Schedl, et al., Cell, 86:71-82 (1996)]. Other vectors also have been developed for the cloning of large segments of mammalian DNA, including cosmids, and bacteriophage P1 [Sternberg et al., Proc. Natl. Acad. Sci. U.S.A., 87:103-107 (1990)]. YACs have certain advantages over these alternative large capacity cloning vectors [Burke et al., Science, 236:806-812 (1987)]. The maximum insert size is 35-30 kb for cosmids, and 100 kb for bacteriophage P1, both of which are much smaller than the maximal insert for a YAC. However, there are several critical limitations in the YAC system including difficulties in manipulating YAC DNA, chimerism and clonal instability [Green et al., Genomics, 11:658 (1991); Kouprina et al., Genomics 21:7 (1994); Larionov et al., Nature Genet. 6:84 (1994)]. As a result, generating transgenic mice with an intact YAC remains a challenging task [Burke et al., Science 236:806; Peterson et al., Trends Genet. 13:61 (1997)].
An alternative to YACs are E. coli based cloning systems based on the E. coli fertility factor that have been developed to construct large genomic DNA insert libraries. They are bacterial artificial chromosomes (BACs) and P-1 derived artificial chromosomes (PACs) [Mejia et al., Genome Res. 7:179-186 (1997); Shizuya et al., Proc. Natl. Acad. Sci. 89:8794-8797 (1992); Ioannou et al., Nat. Genet., 6:84-89 (1994); Hosoda et al., Nucleic Acids Res. 18:3863 (1990)]. BACs are based on the E. coli fertility plasmid (F factor); and PACs are based on the bacteriophage P1. The size of DNA fragments from eukaryotic genomes that can be stably cloned in Escherichia coli as plasmid molecules has been expanded by the advent of PACs and BACs. These vectors propagate at a very low copy number (1-2 per cell) enabling genomic inserts up to 300 kb in size to be stably maintained in recombination deficient hosts (most clones in human genomic libraries fall within the 100-200 kb size range). The host cell is required to be recombination deficient to ensure that non-specific and potentially deleterious recombination events are kept to a very minimum. As a result, libraries of PACs and BACs are relatively free of the high proportion of chimeric or rearranged clones typical in YAC libraries, [Monaco et al., Trends Biotechnol 12:280-286 (1994); Boyseu et al., Genome Research, 7:330-338 (1997)]. In addition, isolating and sequencing DNA from PACs or BACs involves simpler procedures than for YACs, and PACs and BACs have a higher cloning efficiency than YACs [Shizuya et al., Proc. Natl. Acad. Sci. 89:8794-8797 (1992); Ioannou et al., Nat. Genet., 6:84-89 (1994); Hosoda et al., Nucleic Acids Res. 18:3863 (1990)]. Such advantages have made BACs and PACs important tools for physical mapping in many genomes [Woo et al., Nucleic Acids Res., 22:4922 (1994); Kim et al., Proc. Natl. Acad. Sci. 93:6297-6301 (1996); Wang et al., Genomics 24:527 (1994); Wooster et al., Nature 378:789 (1995)]. Furthermore, the PACs and BACs are circular DNA molecules that are readily isolated from the host genomic background by classical alkaline lysis [Birnboim et al., Nucleic Acids Res. 7:1513-1523 (1979].
Functional characterization of a gene of interest contained by a PAC or BAC clone generally entails transferring the DNA into a eukaryotic cell for transient or long-term expression. A transfection reporter gene, e.g., a gene encoding lacZ, together with a selectable marker, e.g., neo, can be inserted into a BAC [Mejia et al., Genoine Res. 7:179-186 (1997). Transfected cells can be then detected by staining for X-Gal to verify DNA uptake. Stably transformed cells are selected for by the antibiotic G418.
However, while PACs and BACs have cloning capacities up to 350 kb, performing homologous recombination to introduce mutations into a gene of interest has not been demonstrated [Peterson et al., TIG 13:61-66]. Indeed, although BACs or PACs have become an important source of large genomic DNA in genome research, there are still no methods available to modify the BACs or PACs. Furthermore, no germline transmission of intact BACs or PACs in transgenic mice have been reported. These, as well as other disadvantages of BACs and PACs greatly limit their potential use for functional studies. Therefore, there is a need for an improved cloning vector for germline transmission of selected genes in transgenic animals. More particularly there is a need for a cloning vector that has the capacity to contain greater than 100 kilobases of DNA, which can be readily manipulated and isolated, but still can be stably stored in libraries relatively free of rearranged clones. In addition, there is a need to provide methodology for generating such cloning vectors. There is also a need to apply such vectors to improve current technologies such as gene targeting.
Gene targeting has been used in various systems, from yeast to mice, to make site specific mutations in the genome. Gene targeting is not only useful for studying function of proteins in vivo, but it is also useful for creating animal models for human diseases, and in gene therapy. The technique involves the homologous recombination between DNA introduced into a cell and the endogenous chromosomal DNA of the cell. However, in the vertebrate system, the rate of homologous recombination is very low, as compared to random integration. The only cell line that allows a relatively high homologous recombination rate and maintains the ability to populate the germline is the murine 129 embryonic stem cells (ES cells). Using this specialized cell, mice can be generated with a targeted mutation (Gene Targeting, a practical approach Ed. by A. Joyner, IRL Press: Oxford, New York, Tokyo). However, the rate of homologous recombination for some gene loci in ES cells is still extremely low (&lt;1%), the procedure is labor intensive, and the cost of generating targeted mutant mice is very expensive. Moreover, since there are no ES cells available for vertebrates other than mice, gene targeting in a germline is still not possible for other vertebrates.
The major limitation for gene targeting in vertebrate cells remain to be the low targeting frequency. One critical factor affecting the targeting frequency is the total length of homology. Deng and Capecchi (MCB, 12:3365-3371) have shown that gene targeting frequency is linearly-dependent on the logarithm of the total homology length over homology lengths of 2.8 kb to 14.6 kb. Since the curve did not plateau at the 14.6 kb homology, it is likely that incorporating greater homology lengths into the targeting vector will further increase the homologous recombination rate. Using a mathematical model developed by Fujitani et al, [Genetics, 140:797-809, (1995)], an estimate can be made that with a total homology of 100 kb isogenous DNA (i.e., DNA from the same strain of mice), the gene targeting rate in ES cells would be 10%. This is a dramatic improvement over the conventional 14.6 kb targeting vector, which only yields a corresponding rate of only 0.03%. Further support for the present strategy i.e., using a large DNA construct for gene targeting rate comes from an experiment with Mycobacterium tuberculosis, the causal agent of tuberculosis. Like vertebrate cells, gene targeting in TB has a very low rate, mainly due to the predominance of random integration over homologous recombination. It has been demonstrated that using a 40-50 kb linear targeting construct, a 6% targeting frequency could be obtained, whereas no targeting event was obtained at all with a smaller (&lt;10 kb) targeting construct [Balasubramanian et al., J. of Bacteriology 178:273-279 (1996)]. Therefore, there is a need to construct large gene targeting constructs to allow efficient gene targeting in many biological systems.
The citation of any reference herein should not be construed as an admission that such reference is available as "Prior Art" to the instant application.