With the progress of science, the traditional Sanger sequencing cannot fully satisfy the needs of the research; the genome sequencing needs a sequencing technology which has lower cost, higher throughput and faster speed, so the second-generation sequencing technology is emerged at the right moment. The core idea of the second-generation sequencing technology is sequencing by synthesis, namely, to determine the DNA sequence by catching the mark of the newly-synthesized end; the existing technical platform mainly includes Roche/454 FLX, Illumina/Solexa Genome Analyzer and Applied Biosystems SOLID system and the like. Taking Illumina product as an example, the GAII reading length has been developed to be 100 bases at present from 36 bases in 2008, and the throughput has been developed to be 240M reads/run at present from 48M reads/run in 2008, the sequencing capacity has been improved by 14 times. By now, each run of HiSeq 2000 can achieve 3 human genomes and 30× coverage sequencing throughput, approximately 300 G/run data, and the processing time on an equipment has been reduced to 30 min. As the second-generation sequencing technology becomes more mature, it is rapidly developed to be applied in clinical research. Studies show that genetic health condition of a fetus can be judged by sequencing the maternal plasma DNA; and the early cancer screening can be performed by sequencing the plasma DNA of test people, thus the second-generation sequencing technology has a strong prospect for application.
The plasma DNA is also called circulating DNA, which is the extracellular DNA in the blood, and of which the length is approximately tens to hundreds of nucleotides; the plasma DNA can exist in the form of DNA-protein complex, and also can be free DNA fragments. Under normal circumstances, the plasma DNA derives from the DNA which is released by a small amount of aging dead cells. In a healthy state, the generation and removal of the circulating DNA are in a dynamic balancing state, and are maintained at a relatively constant low level. The circulating DNA can reflect the status of the cell metabolism of the human body, which is an important indicator of health evaluation. The change of quantity and quality of the peripheral blood circulating DNA have a close relationship with various diseases (including tumour, severe composite trauma, organ transplantation, pregnancy-related diseases, infectious diseases, organ failure and the like); as a non-invasive detection indicator, the circulating DNA can possibly become an important molecular marker for performing early diagnosis, illness monitoring, therapeutic effect evaluation and prognosis evaluation of some diseases.
Since the presence of fetal DNA in maternal plasma was validated(3), non-invasive prenatal diagnosis and detection of fetal chromosomal abnormalities has become a major research subject. In 2007, Professor Lu Yuming and his colleagues proved that the ratio of mutation site of the placental specific gene 4 in maternal plasma Messenger Ribonucleic Acid (RNA) (mRNA) could be used to judge whether the fetus has chromosome 21# which was a triploid(4). The ratio of mutation site is also used for judging whether the chromosome 18# is triploid(5). The limitation of these methods above is that the mutation site is not common in the crowd, therefore, these methods are only suitable for a part of crowd. During the same period, digital Polymerase Chain Reaction (PCR) (dPCR) is used for detecting the fetus with chromosomal triploidy(6), (7). The advantage of the dPCR is that it does not depend on any mutation site; however, the accuracy of dPCR is insufficient, and also requires a large amount of blood samples, which increases the difficulty in sampling.
In recent years, the high-throughput sequencing technologies which are rapidly developed have solved the problems above. These technologies include Genome Analyzer(8) of Illumina Company, SOLiD(9) of Life Technologies Company, and Heliscope(10) of Helicos Company, by which hundreds of millions or even billions of sequences can be detected for once. When detecting the DNA in maternal plasma by these technologies, the change of the chromosome number of a trace amount of fetal DNA in the plasma can be detected(11), (12), (13). But because of high sequencing cost, the technologies have not yet been widespread used at present. Meanwhile, the way of detecting the change of local copy number of the embryonic chromosomes from the maternal blood is an unsolved problem. There are some advantages to detect the change of copy number of the fetal chromosomes from maternal plasma by the high-throughput sequencing; however, the high-throughput sequencing is expensive and cannot be popularized. Moreover, the Coefficient of Variation (CV) of sequencing is high, and the accuracy and stability of detection also need to be improved. And the CV of sequencing also determines that the method is only suitable for a few chromosomes, such as chromosome 21# and chromosome 18#, but unsuitable for detecting the change of partial copy number of the chromosomes at present.
The improvement of the sequencing efficiency and the popularization of multi-sample mixed sequencing demand higher efficiency in preparing samples, particularly in preparing large number of clinical samples; however, the present method of preparing clinical plasma sample pull up to the improvement of the sequencing capacity. Therefore, the efficiency and cost in preparing clinical plasma DNA samples for the second-generation high-throughput sequencing have become the key for high-throughput sequencing to be popular.
Essentially, the process of preparing plasma DNA samples for the second-generation high-throughput sequencing is inserting the DNA which satisfies the sequencing length into the existing sequencing vector, namely, ligating the known sequencing linker on two ends of the DNA to be sequenced. At present, the construction of the plasma DNA library mainly includes: firstly performing end-repairing and 5′-terminus phosphorylation for the extracted plasma DNA, and then performing the main steps, such as dA-overhang, linker ligation and PCR and the like (FIG. 1), wherein, the purification step needs to be implemented in each of the steps above. Such construction method of the plasma DNA sequencing library totally needs 6 main enzymes, 4 enzyme reaction systems, and cleaning and purifying for four times; therefore, it is high in cost and complex in operation, and requires more in operating capacity of the molecular biology for experimenter, and it is difficult to process multiple samples synchronously.