The completion of the Human Genome Project (HGP) in early 2000 (1) was a monumental achievement with incredible amount of combined efforts among genome centers and scientists worldwide. The engine behind this decade long project was the Sanger sequencing method, which still currently maintains as the staple of large-scale genome sequencing methodology in high-throughput genome sequencing centers. The main reason behind this prolonged success was in the basic and efficient, yet elegant method that is Sanger dideoxy chain terminating reaction (2). With incremental improvements in this DNA sequencing technology including the use of laser induced fluorescent excitation of energy transfer dyes (3), engineered DNA polymerases (4) and capillary electrophoresis (5) as well as in the areas of sample preparation, informatics, and sequence analysis software (6-9), the Sanger sequencing platform has been able to maintain its status as champion in the sequencing world. Current state-of-the-art Sanger based DNA sequencers can produce over 700 bases of clearly readable sequence in a single run from templates up to 30 kb in length (10-12). However, as is with most of technological inventions, the continual improvements in this sequencing platform has come to a stagnant plateau, with the current cost estimate for producing a high-quality microbial genome draft sequence at around $10,000 per megabase pair. Current DNA sequencers based on the Sanger method allow up to 384 samples to be analyzed in parallel. However, one of the drawbacks to using electrophoresis for DNA separation is the deterioration of resolution due to band compressions. DNA sequences that are repeat rich and promote formation of secondary structures, such as hairpins, affect electrophoretic mobility, which also result in band compressions. This is the main reason behind maximum read-length limit for this sequencing method (13, 14). From a physics and engineering standpoint, the maximum read-length and parallelization based on capillary electrophoresis separation has already been reached (15).
At the onset of the post HGP-era, with realization of current sequencing platform's limitation, both public (National Human Genome Research Institute, NHGRI) and private genomic sciences sector (The J. Craig Venter Science Foundation and Archon X prize for genomics) have mandated a call for the development of next-generation sequencing technology that will reduce the cost of sequencing 100 and 10,000 fold in the next 5 to 10 years, respectively (16-19). With the development of a breakthrough DNA sequencing technology, which is already underway with heavy biotechnology industry involvement (20), it will allow for affordable genome sequencing. Genome research will be able to be conducted where it will be possible to move from studying single genes at a time to analyzing and comparing entire genomes. Recent data has demonstrated that the fundamental differences between many species including between mammals is not the overall number of genes, but lies at the more subtle regulatory level (21). This has led to the desire to sequence more genomes of closely related species as well as more human genomes. In addition, personalized medicine, gene expression analysis, splice form analysis and many other areas have demands for high-throughput sequencing projects that cannot be performed at the current speeds and costs. To overcome the limitations of the current sequencing technology based on electrophoresis using laser induced fluorescence detection (22-24), new methods must be developed which start from new paradigms to build a sequencer that can handle the new demands imposed by these new goals. Potential sequencing methods making significant steps forward into the new sequencing era include pyrosequencing (25-26), mass spectrometry sequencing (27-29), sequence specific detection of single-stranded DNA using engineered-nanopores (30), sequencing of single DNA molecules (31), polony sequencing (32) and sequencing by synthesis using cleavable fluorescent reversible terminators (33).
While fluorescent-based SBS methods have almost unlimited ability for parallelization, restricted only by the resolution of the imaging system, to date they have been limited to read lengths of about 35 bases. The successful implementation of sequencing by synthesis (SBS) is effectively dependent on the read length of the target DNA template. One of the major factors that determines the read length when performing SBS is the number of available templates. Our laboratory has recently developed two powerful approaches for SBS: 1) Hybrid SBS with nucleotide reversible terminator (NRTs, 3′-O—R1-dNTPs) in combination with fluorescently labeled dideoxynucleotide (ddNTPs-R2-fluorophore), and 2) SBS with cleavable fluorescent nucleotide reversible terminator (C—F-NRTs, 3′-O—R1-dNTPs-R2-fluorophore) (“Four-color DNA Sequencing with 3′-O-modified Nucleotide Reversible Terminators and Chemically Cleavable Fluorescent Dideoxynucleotides”. J. Guo, N. Xu, Z. Li, S. Zhang, J. Wu, D. Kim, M. S. Marma, Q. Meng, H. Cao, X. Li, S. Shi, L. Yu, S. Kalachikov, J. Russo, N. J. Turro, J. Ju. Proceedings of the National Academy of Sciences USA. 2008, 105, 9145-9150) (“Four-Color DNA Sequencing by Synthesis Using Cleavable Fluorescent Nucleotide Reversible Terminators”. J. Ju, D. Kim, L. Bi, Q. Meng, X. Bai, Z. Li, X. Li, M. S. Marma, S. Shi, J. Wu, J. R. Edwards, A. Romu, N. J. Turro. Proceedings of the National Academy of Sciences USA. 2006, 103, 19635-19640). Since the incorporation of ddNTPs-R2-fluorophore into a strand of DNA permanently terminates further extensions of that template in the first approach and the incorporation and cleavage of C—F-NRTs leaves a tail of the modified nucleotide that causes possible steric hindrance to lower the incorporation efficiency of the subsequent base in the second approach, the total number of sequenceable templates decreases after each cycle of SBS reaction. Various means can be employed to minimize this rate of template reduction. Among those, a powerful method termed template “walking” can potentially diminish the negative effect of template termination or reduction and extend the read length of SBS at least two to three-fold.