Exon sequencing, also known as target exome capturing, refers to a genome analysis method by which DNAs containing exon regions of a whole genome are captured by means of sequence region capture technology and subjected to high-throughput sequencing after enrichment. Exon sequencing is an efficient strategy for selecting coding sequences from the genome, which is less costly than genome resequencing, and has great advantages in studying mononucleotide polymorphism, insertion and deletion of known genes. An exon library commonly used is a library containing double-stranded DNAs for the Illumina platform or Proton platform which is roughly constructed according to protocols as follows: randomly fragmenting the genomic DNA into fragments of lengths ranging from 180 to 280 bp, ligating an adaptor at each terminal of the fragment subsequent to end-repairing and adding an adenine (A) tail, so as to construct a library. The library is subjected to first enrichment by first liquid hybridization with probes marked with biotin, then exons obtained after the first enrichment are captured by magnetic beads coated with streptomycin and then eluted from the magnetic beads to perform second enrichment by second liquid hybridization. The library obtained after twice enrichment is linear amplified by PCR reaction and can be sequenced after it is tested to be qualified.
However, there is no reliable process for constructing the exon library used in Complete Genomics (CG) platform sequencing. A library construction method in the related art is such a method which is based on CG and by which a library is constructed with a single adaptor according to the protocol basically shown in FIG. 1: randomly fragmenting the genomic DNA into fragments of lengths within a certain range in a physical way, end-repairing and directionally ligating an adaptor A, liquid hybridizing with probes coated with biotin, capturing exons with magnetic beads coated with streptomycin, performing PCR amplification and isolating single-stranded nucleic acids; and finally cyclizing the single-stranded nucleic acid to obtain a library containing single-stranded cyclic nucleic acids. Such a method is complex and time-consuming, so there is still much room for improvement.
A transposase fragmentation kit, led by the Nextera kit from Epicentra company (purchased by Illumina), may complete DNA fragmentation and adaptor ligation at the same time by means of the transposase, thereby reducing the time for sample preparation. Such a fragmentation and adaptor ligation method may be used in the library construction.
In view of the simplicity of the various operations, transposase fragmentation is undoubtedly far superior to other methods in terms of throughput and operation simplicity. However, such fragmentation also has shortcomings. For example, transposition realized by the transposase depends on a specific 19 bp Me sequence. Therefore, though the transposase may ligate different adaptor sequences to a target sequence respectively at the 5′-terminal and the 3′-terminal by embedding two completely different adaptor sequences, the target sequence after fragmentation will symmetrically contain a Me sequence at each terminal thereof with a 9 nt gap formed between the target sequence (fragmented fragments) and Me sequence due to the special function of the transposase. However, the identical Me sequences at two terminals of the target sequence will have an adverse influence on downstream technology applications. For example, when combining this adaptor ligation with the next-generation sequencing technology, the fact that the Me sequences located at two ends of the same strand of the target sequence are complementary to each other, will easily result in internal annealing within one single-stranded molecule, thus adversely contributing to combination with an anchoring primer.
At present, there is an urgent need for a simple method for constructing a library containing single-stranded cyclic nucleic acids, especially suitable for exon sequencing.