SNP has extremely important value in molecular diagnostics, clinical testing, pathogen detection, forensics, genetic disease research, development of individual therapy and drugs, and many other fields (Gayet-Ageron et al., 2009). SNP detection is one of the main contents of the current genetic diagnosis. At the same time, genetic diagnosis represented by SNP detection has become an important means of screening newborns or specific populations for genetic diseases. Therefore, an easy-to-operate, low-cost and high-throughput SNP detection method is key to genetic testing.
The second generation of high-throughput sequencing technology is more accurate, sensitive, with higher throughput compared to other high-throughput gene detection technology. It has been involved in various aspects of life science and medical research with its lower prices and expanding range of applications. The use of high-throughput sequencing technology for high-throughput SNP detection is one of the current research focuses.
Currently, second-generation sequencing technologies are needed to build sequencing library. The sequencing libraries are then used for sequencing. The general steps include DNA extraction, DNA fragmentation, fragment selection, library construction (including adding connectors, amplification and other steps). The last step of machine sequencing is the data analysis. Among these steps, library construction takes the most time and effort, and in the process of building a database, the sample genes are amplified multiple times and are therefore prone to bias. Building a database using this method, all the genome fragments have the same chance of being sequenced. Therefore, this method is suitable for genome sequencing. If only one of some genes or only a specific part of the sequence is to be detected, this method would be a waste of sequencing space and would increase the difficulty of data analysis. In addition, samples treated in this way need complicated steps, need a huge complex sequencing data, require high initial amount of nucleic acid, and it is difficult for large-scale sequencing of samples simultaneously.
For some of the specific gene (such as exons, some single-gene disease-causing gene), sequencing will require additional steps to building a database target sequence enrichment. Currently the most widely used method is to capture the enrichment of target sequences by hybridization. The widely used target sequence capture technology is mainly based on solid-phase hybridization (Choi et al. 2009) or liquid phase hybridization capture technology to capture (Bainbridge et al. 2010). An existing custom capture commercial kit can be used (e.g., NimbleGen sequence capture array or Aglient sureselect target enrichment system, etc.), but these commercial custom sequence capture kits are generally expensive, and once the chips are customized, the target sequences to be detected are fixed and cannot be changed. In addition to the use of the target sequence capture technology for gene sequencing studies, the PCR technique based on non-hybrid sequence capture technology has also been applied, but there's the disadvantage of multiplex PCR-based technologies. For example, some areas will not be effectively amplified. Meanwhile, due to the amplification of errors by polymerases, and that all of the genetic fragments are mixed and amplified, the sequencing results are difficult to verify.
Illumina offers a different PCR amplification method to build a database (TruSeq custom amplicon). Through the probe and target specific sequence hybridization, two probes are anchored to the target sequence at the 5′ and 3′ ends. DNA polymerase extends to fill the gap (e.g., the sequence of interest) between the two probes, followed by sequencing. This method requires design of different measured probes for the gene sequences. It is complex and the quality of the database will be greatly influenced by the hybridization efficiency. When using this method to detect low-frequency mutations, it would waste sequencing space, because the vast majority of sequences being sequenced are wild-type sequences. It needs to increase the depth of sequencing to achieve compliance with the requirements of sensitivity. Using this method for large-scale population mutated genetic screening, wild-type sequences will take up most of the sequencing space, resulting in increased cost of sequencing. In addition, using this method, each sample requiring up to several hundred ng amount of nucleic acid is not conducive to lower concentrations of some rare or difficult to obtain samples of nucleic acid sequencing.
In order to address the shortcomings of these approaches, there is a need for a high-throughput sequencing method to build libraries for SNP detection.