The development of the second-generation sequencing technology promotes the revolutionary development of biology and biomedical research. However, about 1% bases are not correctly sequenced due to natural characteristics of high-throughput sequencing. Although 1% error rate is tolerable in some applications, these 1% base errors may cover up lots of real information and hinder researches in many situations, for example, determining whether a tissue or organ of a normal individual has potential carcinogenic mutation sites, determining heterogenicity of DNA composition and latent small clone colonies in cancer cell colonies, tracing origin and division pattern of a cell by using a DNA mutation as label in the cell, accurately obtaining genotype of a highly-hybridized cancer colony, calculating rate of mutation generation during division of cancer cells or somatic cells, finding pathogenic mutations in some small colonies (e.g., cancer stem cells) during biomedical therapy. Hence, it is a very vital problem on how to accurately determining DNA sequence by using currently available second-generation sequencing technologies.
So far, some attempts have been carried out to reduce errors of the second-generation sequencing from biological and chemical aspects. For example, non-amplification library building method can effectively avoid errors generated during polymerase chain reaction amplification in preparation of library; and chain-specific errors can be effectively screened by adding labels to sample DNA and reference DNA. Further, some methods try to reduce error rate of the second-generation sequencing from perspective of data analysis. In addition, some other methods try to rectify errors generated during polymerase chain reaction amplification by using breakpoint information of random DNA breaks or adding labels to DNA template prior to polymerase chain reaction amplification, wherein it can be determined by adding labels which DNA molecules are derived from the same template, and thus rectification is achieved.
These methods improve the accurateness of the second-generation sequencing to a certain extent, but still have drawbacks respectively. For example, Kinde, et al., (Kinde I, Wu J, Papadopoulos N, Kinzler K W, Vogelstein B (2011) Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci USA 108:9530-9535), report that the addition of labels is carried out by adding labels at terminals of specific primers and then adding labels in DNA molecules via polymerase chain reaction, thus when an error occurred in polymerase chain reaction during addition of labels, this error can hardly be removed in subsequent steps, and thus determination of extremely infrequent sites by this method is limited. One serious limitation for the method of adding exogenous labels to DNA is that this method can be merely applied to small genomes or a small number of target genes, and cannot be used for comprehensive determination of a whole genome. The reason for this is that mutual rectification of DNA positive and negative chains can be carried out in the labeling method only when identical and complementary labels are determined, which requires a great sequencing depth that can hardly be achieved for a large genome.
In the meantime, since peripheral blood can be readily collected without invasive effects to body and its mutation information reflects real mutation of individual to a certain extent, determination of mutation information of free DNAs in peripheral blood is widely used in antenatal diagnosis and cancer surveillance. However, when free DNAs in peripheral blood are degraded into 140-170 base-pairs and only thousands copies exist in 1 milliliter of blood. Therefore, a problem to be solved is that how to build DNA libraries effectively using such a little amount of DNA, how to determine an extremely infrequent mutation in free DNAs of peripheral blood by using a limited sequencing coverage.
Most of fossil DNAs are contaminated by microorganism and such DNAs are of a very small amount and seriously degraded. Therefore, a problem in studying ancient human DNAs is how to effectively build second-generation high-throughput sequencing libraries and effectively enrich ancient human DNAs by using a very small amount of seriously degraded fossil DNAs.
In sum, it is necessary to build DNA sequencing libraries for rapid, effective and accurate sequencing.