In recent years, the sequencing of the genomes of individual species, including humans, has become a major goal of biomedical research. The most prevalent procedure for sequencing the coding regions of a gene relies on RNA based methods, such as direct screening of a cDNA library. However, such methods are inherently biased towards the identification of nucleic acids which are prevalent in the tissue sample being studied. Therefore, genes which are expressed solely in tissues that are difficult to obtain, and/or expressed under relatively rare circumstances, have a good chance of being missed. Particularly in the latter case, these genes are likely to play a unique role during a specific cellular challenge, and thus could be important in a specific diseased state.
Exon trapping is one method of potentially overcoming the inherent bias of the mRNA based procedures of genomic sequencing. Exon trapping was originally developed to efficiently isolate coding sequences from complex genomic sequences [Duyk et al., PNAS 87, 8995-8999 (1990); Buckler et al., PNAS 88:4005-4009 (1991)]. This method is based on the selection of exons which are flanked by functional 5' and 3' splice sites. Conventional exon trapping vectors contain a driving promoter (i.e. SV40 promoter, metallothionein-1 promoter) which controls the expression of an exon having a 5' splice site; an intron with multiple cloning sites; and a 3' exon having a 3' splice site and a polyadenylation (poly A) signal sequence. Genomic fragments containing potential exons are first subcloned into the intron. The resulting plasmid DNA is then transfected into COS-7 cells, which transcribe and then process the RNA products. The mature RNAs containing the trapped exons can be amplified by reverse transcriptase PCR and subcloned. The trapped exons can be identified by sequencing the cloned cDNA products. In addition to its simplicity and efficiency, exon trapping is also independent of the amount, location, and timing of the expression of a given gene, and therefore is preferable to mRNA based methods. Consequently, exon trapping has become widely employed in transcription map construction for positional cloning and in general genomic sequencing.
Unfortunately, current exon trapping systems have a number of limitations. First, the size of the genomic insert in the exon-trapping vector is limited to 1-2 kilobases (kb), so the resulting trapped exon is usually a single small exon (80-150 basepairs (bp)). Such small exons are usually difficult to use in subsequent biological procedures, such as library screening, Northern blot analysis, or in in situ hybridizations. Second, different exons from a single gene will be dispersed in different trapping vectors. Therefore, reconstruction of the gene from the small pieces of the gene requires considerable additional work. Third, subcloning of small genomic fragments may disrupt the elements necessary for proper splicing, thereby increasing the chance of missing certain exons. Fourth, current exon trapping systems can only be used in combination with specific cell lines (i.e. COS cells), since they require specific cellular factors to support the SV40 origin of replication, and as a result certain exons are spliced in a tissue specific manner, and therefore would be missed in the COS cells.
One recent advance towards solving some of these problems uses cosmid-based exon trapping vectors [Datson et al., NAR 24, 1105-1111 (1996)]. A specially designed cosmid vector is used, with a promoter and 5' splice site on one end, and 3' splice site and poly-adenylation signal sequence on the other end. The genomic insert now can be as large as 40 kb. In this case, multiple exons can be trapped together. Such a trapped gene segment can be greater than 800 bp. The major disadvantage of this system is that it is necessary to use a specialized genomic cosmid library. Furthermore, cosmid clones are inherently unstable.
An alternative to using a cosmid based system is to use one or more of the E. coli based cloning systems based on the E. coli fertility factor which have been developed to construct large genomic DNA insert libraries. These are bacterial artificial chromosomes (BACs) and P-I derived artificial chromosomes (PACs) [Mejia et al., Genome Res. 7:179-186 (1997); Shizuya et al., Proc. Natl. Acad. Sci. 89:8794-8797 (1992);Ioannou et al., Nat. Genet., 6:84-89 (1994); Hosoda et al., Nucleic Acids Res. 18:3863 (1990)]. BACs are based on the E. coli fertility plasmid (F factor); and PACs are based on the bacteriophage P1. The size of DNA fragments from eukaryotic genomes that can be stably cloned in Escherichia coli as plasmid molecules has been expanded by the advent of PACs and BACs. These vectors propagate at a very low copy number (1-2 per cell) enabling genomic inserts up to 700 kb in size to be stably maintained in recombination deficient hosts. The host cell is required to be recombination deficient to ensure that non-specific and potentially deleterious recombination events are kept to a very minimum. As a result, libraries of PACs and BACs are relatively free of the high proportion of chimeric or rearranged clones typical in Yeast artificial chromosomes (YACs). [Burke et al., Science 236:806; Peterson et al., Trends Genet. 13:61 (1997); Choi, et al., Nat. Genet., 4:117-223 (1993), Davies, et al., Biotechnology 11:911-914 (1993), Matsuura, et al., Hum. Mol. Genet., 5:451-459 (1996), Peterson et al., Proc. Natl. Acad. Sci., 93:6605-6609 (1996); Schedl, et al., Cell, 86:71-82 (1996); Monaco et al., Trends Biotechnol 12:280-286 (1994); Boyseu et al., Genome Research, 7:330-338 (1997)]. In addition, isolating and sequencing DNA from PACs or BACs involves simpler procedures than for YACs, and PACs and BACs have a higher cloning efficiency than YACs [Shizuya et al., Proc. Natl. Acad. Sci. 89:8794-8797 (1992);Ioannou et al., Nat. Genet., 6:84-89 (1994); Hosoda et al., Nucleic Acids Res. 18:3863 (1990)]. Such advantages have made BACs and PACs important tools for physical mapping in many genomes [Woo et al., Nucleic Acids Res., 22:4922 (1994); Kim et al., Proc.Natl.Acad.Sci. 93:6297-6301 (1996); Wang et al., Genomics 24:527 (1994); Wooster et al., Nature 378:789 (1995)]. Furthermore, the PACs and BACs are circular DNA molecules that are readily isolated from the host genomic background by classical alkaline lysis [Birnboim et al., Nucleic Acids Res. 7:1513-1523 (1979)]. In addition, BACs have been found to be an important source of genomic DNA for the direct sequencing of the human genome [Rowen et al., Sequence 278: 605-607 (1997)]. On the other hand, their use in gene identification is still extremely limited. Indeed, heretofore, BACs and PACs have not been shown to be useful in methods that directly isolate genes, such as exon trapping.
Therefore, there is a need to efficiently sequence coding regions of eukaryotic genes, and in particular human genes, which are expressed relatively rarely and/or only at specific times (such as the genes involved in circadian rhythms or those involved in body weight homeostasis); and/or are predominantly expressed in tissues that are difficult to obtain, such as the human organ of Corti. In addition there is a need to produce new and improved gene maps for BAC or PAC contigs. Furthermore, there is a need to compile new cDNA libraries that are not biased by the expression pattern of the tissue that serve as the source for the mRNAs used to construct the cDNA library.
The citation of any reference herein should not be construed as an admission that such reference is available as "Prior Art" to the instant application.