The enormous wealth of information that has been acquired from genomic and expressed sequence tag (EST) sequencing in the last 10 years has contributed significantly to efforts to clone full-length cDNA representatives. Although it is anticipated that genomic sequencing projects from human and mouse will be completed in the near future, the transcriptome of these species will remain ambiguous for some time. The complexities involved in predicting, with complete certainty, the splicing program of mRNAs from genomic sequences have compelled additional genomic research focused on obtaining the sequences of full-length cDNAs. In addition, full-length cDNA sequencing efforts are also required for the confirmation of cDNA sequences after methods that involve amplification of the cDNA have been employed for cloning. This scenario is particularly prevalent in genomic centers that are focused on validating gene targets for drug discovery efforts. Clearly, after great expense and effort has been expended, it would be senseless for a putative target to fail the validation process simply because the coding sequence of the target gene was incorrect. Therefore, approaches are required at genomic centers to sequence large numbers of full-length clones quickly, inexpensively, and accurately. For this purpose, Applicants have created a new, integrated high-throughput process called transposon expedited multiplex sequencing (TEMS).
In the last 20 years, many methods have been developed for sequencing large inserts to plasmids. However, many of these methods were cumbersome and could not be transferred to high-throughput, automated systems. Sequencing by primer walking is slow, expensive, and often fails since primers are designed to the sequence in a poorly characterized region. Similarly, sequencing by creating a collection of clones by exonuclease digestion from the ends of the target clone is slow, clone specific, and extremely sensitive to the purity and integrity of the template DNA. In addition, the success rate using this approach is quite variable. Shotgun sequencing of clones is a higher-throughput method, however it requires isolating the insert from each clone and then recloning smaller fragments generated by a wide variety of methods. Additionally, shotgun libraries that utilize restriction digests result in a cloning bias and subsequently a non-random distribution of DNA sequence data.
Transposon-mediated sequencing can be done by pooling a large number of vectors containing target DNA sequences and randomly inserting a transposon with sequencing primers on each end into the constructs. See Devine, S. E., Boeke, J. E. (1994) Efficient integration of artificial transposons into plasmid targets in vitro: a useful tool for DIVA mapping, sequencing and genetic analysis, Nucleic Acids Research, pp. 3765-3772; or Kimmel, B., M. J. Palazzola, C. Martin, J. D. Boeke, and S. E. Devine, 1997, Transposon-mediated DNA sequencing. In Genomic Analysis: A laboratory manual (ed. E Green, B. Birren, R. Myers, and P. Hieter), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., for a description of the method. Traditionally, this method was cumbersome since it required moving plasmids through different host strains for the cloning and transposon-insertion steps. Recently however, several commercial molecular biology vendors have developed in vitro transposition systems to take advantage of the random insertion of a modified transposon (ie. Tn5 etc.). Unfortunately, some of these systems result in a high background of false positives, and are difficult to use with methods to screen positive clones by polymerase chain reaction (“PCR”). Applicants have utilized several of these transposon based sequencing methods and have not experienced any of these difficulties with a modified version of the in vitro GPS-1 transposition system from New England Biolabs. Nevertheless, transposon insertions cannot be directed exclusively to the target DNA of interest and appear in the vector with a high frequency. In an effort to solve this problem and increase the efficiency of transposon facilitated sequencing, Applicants have developed a unique, high-throughput procedure called transposon expedited multiplex sequencing (TEMS).
Accordingly, it is an object of this invention to provide a high-throughput, efficient, and inexpensive process for the sequencing of DNA fragments.
It is a further object of this invention to provide a high-throughput, efficient and inexpensive process for transposon-mediated sequencing of target DNA fragments which minimizes the amount of non-target DNA sequence generated.
It is yet another object of this invention to provide a PCR-based screen to distinguish between the desired constructs with transposons inserted into the target DNA sequence and the undesired constructs with transposons inserted elsewhere.