Efforts to determine the nucleotide sequence of complete genomes, or a large portion thereof, have traditionally taken a so-called bottom-up approach, including the steps of mapping the genome, preparing a library of random large clones in yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), P1 or cosmids, followed by random subcloning in M13-like vectors. Among such systems, BACs are at present the preferred vector for maintaining large genomic DNA fragments. BACs are preferred because individual DNA fragments are maintained stably in a single copy vector in the host cells, even after 100 or more generations of serial growth. In contrast, the DNA fragments cloned into YACs tend to be unstable and can yield chimeric clones. It is difficult to recover DNA clones from YACs in a pure form.
BAC (or pBAC) vectors typically accommodate inserts in the range of approximately 30 to 300 kilobase pairs. A widely used BAC vector, pBeloBac11, uses a complementation of the lacZ gene to distinguish insert-containing recombinant molecules from colonies carrying the BAC vector, by color. When a DNA fragment is cloned into the lacZ gene of pBeloBac11, insertional activation results in a white colony on X-Gal/IPTG plates after transformation. Kim, U-J et al., "Construction and Characterization of a Human Bacterial Artificial Chromosome Library," Genomics 34:213-218 (1996). Thus, it is now possible to distinguish those colonies that contain BACs with inserts from those that lack inserts. A similar prior vector, pBAC108L, lacked the ability to distinguish insert-containing BACs. Shizuya, H., "Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector," P.N.A.S. U.S.A. 89:8794-8797 (1992).
Although these single-copy vectors are advantageously used to clone large genomic DNA fragments for subsequent analysis, especially sequence analysis, the single-copy nature of these vectors is also a limitation in that large numbers of cells containing a BAC clone of interest must be grown to produce a sufficient quantity for subsequent analysis. It is, of course, possible to amplify portions of a BAC clone of interest using, a for example, PCR, but simple amplification of the entire insert from a BAC vector has not previously been possible.
In 1993, Szybalski proposed a system for fragment-by-fragment sequencing of entire bacterial genomes by in vivo excision and subsequent amplification of the excised genomic fragments. Szybalski, W., "From the Double-Helix to Novel Approaches to the Sequencing of Large Genomes," Gene 135:279-290 (1993). One aspect of the Szybalski (1993) system is that the excision from the bacterial chromosome and the amplification of the excised genomic fragment be controlled very stringently so that excision is induced only on command and amplification is initiated only upon induction.
According to the proposed system, excision-mediating sites (EMS) are placed at 500-100 kb intervals throughout a well mapped genome in places where the EMS do not interfere with viability or genomic stability. The EMS are placed throughout the system either by (1) randomly inserting plasmids carrying transposons, EMS, and selective markers, or (2) by targeting insertions that inactivate specific, non-essential genes. EMS identified by Szybalski include .lambda. att, 2 .mu. FRT, and P1 lox.
The net result of adding EMS to a genome in the Szybalski (1993) system is a library of strains, each of which carries one EMS at a physically mapped site. Using genetic crosses between strains having neighboring EMS at a suitable distance from one another, a set of strains are produced where each strain possesses exactly two neighboring EMS. From such strains, the intervening segment (e.g., 50-100 kb), can be excised. Accordingly to the Szybalski 1993 system, if a cis-acting ori element is positioned together with and next to the excision elements, the ori element will be present on the excised DNA circle and can promote the amplification of the DNA circle.
Szybalski (1993) does not contemplate employing an amplification and excision system in conjunction with a bacterial artificial chromosome. Such a system would provide additional benefit in that it would not require the steps of interspersing EMS throughout the genome and crossing EMS-containing strains, but could instead rely upon the use of rare-cutting restriction enzymes to generate genomic fragments of suitable size for cloning into a BAC vector.