The present invention is in the field genomic sequencing and assembly. The present invention specifically provides methods and kits that can be used to obtain DNA sequence information form the ends of large cloned inserts such as genomic DNA.
Recent developments have lead to an increase in sequencing output. It is now possible to determine the entire genetic code of an organism. There are two general methods used for genomic sequencing: a BAC-by BAC approach, and whole genome shot-gun sequencing.
The high copy pUC based plasmids and medium copy pBR based plasmids and other standard vectors that are used in cloning can generally accommodate inserts in the range of 2 kb and 10 kb, respectively. Derivatives of pUC, such as pUC18, can accept sizes up to perhaps 4 kb to 5 kb without instability. Derivatives of pBR, such as pBR332, can take inserts up to about 15 kb without instability. Previous work has shown that such vectors are not readily usable for larger inserts, such as inserts of 25 kb, 40 kb, 50 kb and 60 kb. Such insert sizes are generally not stable, and this instability worsens as the insert size increased. In addition, there is generally a marked variation in colony size and plasmid preparations show wide variation of insert sizes and skewing to lower molecular weight, as well as some vector without inserts.
Shot-gun sequencing and assembly methods that have been developed rely on the use of end sequence reads of cloned inserts of approximately 2 kb and 10 kb as well as end sequence reads from BAC clones (150 kb). The use of these different size fragments provides sequence distance anchors that can be used to assemble a genome from the sequence reads. (Myers, et al., Science Mar 24;287(5461):2196-204 (2000); Weber and Myers, Genome Res. May;7(5):401-9 (1997))
One of the limitations in the shot-gun approach is the need to produce a set of BAC clones that tile the genome of the organism. This process is both time consuming and expensive. There is therefore a need in the art to develop an alternative to BAC end sequencing as it is applied to genome sequencing and assembly.
The present invention provides a method of producing a cloned insert that is representative of the ends of a large segment of DNA from the genome of an organism. Specifically, the present invention provides a clone insert strategy that comprising the steps of:
a) isolating DNA from an organism;
b) fragmenting the DNA to produce large sized fragment inserts, either randomly or in a directed fashion;
c) ligating the fragmented DNA insert into a vector;
d) digesting the vector with a restriction enzyme to cut at least twice within the insert and not in the vector; and
e) ligating the digested insert/vector to close the deletion.
This method produces a library of cloned DNA inserts in a plasmid where the insert contains the ends from a large segment of DNA and is anchored by a restriction endonuclease site.