The next generation high-throughput sequencing technology based on massively parallel DNA sequencing platform has revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduces ambiguity during genome reconstruction, and to the presence of heterozygotic fragments derived from separate parental chromosomes which results in separate assembly of paired fragments from parental chromosomes after amplification.
By spanning highly repetitive regions and decreasing the assembly of homologous chromosomes, the combination of long-read sequences assembled by short-read sequences with isolation of haplotype can also help overcome problems posed by both repeats and heterozygosis. Recently, Illumina has produced TruSeq Synthetic Long-Read DNA library preparation kit to solve the problems caused by short-read sequences. However, because genomic DNA fragment is fragmented into a length of 8 to 10 kb in the initial step using such a DNA library preparation kit, the resulting DNA fragment is not long enough to overcome the deviations caused by large repeats, especially in a case of constructing a library of genome containing lots repeats. The application of Long Fragment Read Technology (LFR) developed by Complete Genomics is a case in point, which is able to construct a library with high accuracy using about 100 pg of human DNA per sample. Although LFR decreases the amount of genome in each well to 10% to 20% of a haploid genome using 384-well plate, the amount of genomic DNA fragment in each well is still not low enough to improve the efficiency of assembly.
Thus, there is a need in the field for an improved method of constructing a sequencing library to improve the accuracy and efficiency of the whole genome sequencing.