The science of molecular biology has progressed rapidly to the point where characterization and sequencing of the entire genome of an organism is feasible. However, at the present time, the characterization and sequencing of large genomes is labor intensive and requires the sequencing of each region of the genome multiple times in order to obtain a complete, contiguous sequence.
Currently, several strategies are available for sequencing large genomes. In the shotgun sequencing method, the genome is randomly fragmented and cloned into sequencing vectors. The resulting clones are sequenced and overlapping sequences are identified and ordered to generate a contiguous sequence. Using this approach, high quality sequence can only be assembled after very large amounts of sequence data, ranging from five to seven times the amount of raw data to be sequenced, are accumulated. Even after such extensive over-sequencing, primer walking is required for final gap closure.
As an alternative to both shotgun and primer walking, nested deletion strategies provide an economic means of determining the primary structure of DNA. Nested deletion strategies produce an array of clones with overlapping deletions which are anchored at one end (i.e. all the deletions share one undeleted end in common). Contig assembly through nested deletion methodology is much simpler than in the case of a shotgun approach, and two to three times less raw sequence information is needed.
Enzymatic methods represent one approach to generating nested deletions. Enzymatic methods used to generate nested deletions include exonuclease treatment of double stranded DNA, using enzymes such as double stranded exonuclease Bal31 [Guo, L. H., Yang R. C., and Wu R., Nucleic Acids Research 11 (16): 5521-5540 (1983)], or the more widely used exonuclease III [Henikoff, S. An improved strategy for rapid direct sequencing of both strands of long DNA molecules cloned in a plasmid, Gene 28, 351-359 (1984)]. These methods provide good templates for sequencing but require large quantities of high quality, i.e. pure, non-nicked, DNA since exonuclease extension of gaps from potential pre-existing nicks would give rise to aberrant sub-clones. In addition, the current enzymatic methods for producing nested deletions require numerous bacterial transformation steps in order to produce a set of minimally overlapping clones. The number of bacterial transformations necessary to conduct the exonuclease methods is directly proportional to the size of the insert to be sequenced, with one transformation required per 300-400 basepairs to be sequenced.
Alternatively, transposition based methods may also be used to generate nested deletions. The methods based on transposition employ the in vivo capacity of clones to undergo either intermolecular [J Mol Biol 178 (4): 941-948 (1984) Use of transposon-promoted deletions in DNA sequence analysis. Ahmed A] or intramolecular [Proc Natl Acad Sci USA 90 (16): 7874-7878 (1993) pDUAL: a transposon-based cosmid cloning vector for generating nested deletions and DNA sequencing templates in vivo. Wang G, Blakesley R W, Berg D E, Berg C M] transpositions, achieving a deletion joining one end of the transposon to a random site within the target DNA sequence. These methods greatly reduce the number of manipulations necessary to produce high quality data. Still, the existing transposon-based vectors designed for sequencing require significant efforts both for initial cloning and for generating subsequent sub-clones. Almost all of such existing vectors are high copy number, relatively unstable plasmids, which do not permit the sequence determination of numerous regions of eucaryotic genomes, which are unclonable when present in multiple copies. After the initial cloning step, the resulting recombinant cells have to be transformed with a transposase-containing plasmid. Furthermore, after transposase action, another cycle of retransformation is obligatory in order to obtain pure subclones, harboring one single transposon-mediated deletion. These factors restrict the number of regions that can be processed simultaneously.
The present invention concerns a new vector for sequencing and mapping large regions of eucaryotic DNA using enzymatic and/or transposition based methods for generating nested deletions. New techniques for sequencing large regions of DNA and mapping the locations of markers within large regions of DNA are also provided.