DNA sequencing is driving genomics research and discovery. The completion of the Human Genome Project was a monumental achievement with incredible amount of combined efforts among genome centers and scientists worldwide. This decade-long project was completed using the Sanger sequencing method, which remains the staple genome sequencing methodology in high-throughput genome sequencing centers. The main reason behind the prolonged success of this method is its basic and efficient, yet elegant, method of dideoxy chain termination. With incremental improvements in Sanger sequencing—including the use of laser-induced fluorescent excitation of energy transfer dyes, engineered DNA polymerases, capillary electrophoresis, sample preparation, informatics, and sequence analysis software—the Sanger sequencing platform has been able to maintain its status. Current state-of-the-art Sanger based DNA sequencers can produce over 700 bases of clearly readable sequence in a single run from templates up to 30 kb in length. However, as it is with most technological inventions, the continual improvements in this sequencing platform has come to a stagnant plateau, with the current cost estimate for producing a high-quality microbial genome draft sequence at around $10,000 per megabase pair. Current DNA sequencers based on the Sanger method allow up to 384 samples to be analyzed in parallel.
It is evident that exploiting the complete human genome sequence for clinical medicine and health care requires accurate low-cost and high-throughput DNA sequencing methods. Indeed, both public (National Human Genome Research Institute, NHGRI) and private genomic sciences sector (The J. Craig Venter Science Foundation and Archon X prize for genomics) have issued a call for the development of “next-generation”sequencing technology that will reduce the cost of sequencing to one-ten thousandth of its current cost over the next ten years. Accordingly, to overcome the limitations of current conventional sequencing technologies, a variety of new DNA sequencing methods have been investigated, including sequencing-by-synthesis (SBS) approaches such as pyrosequencing (Ronaghi et al. (1998) Science 281: 363-365), sequencing of single DNA molecules (Braslaysky et al. (2003) Proc. Natl. Acad. Sci. USA 100: 3960-3964), and polymerase colonies (“polony” sequencing) (Mitra et al. (2003) Anal. Biochem. 320: 55-65).
Some conventional next-generation sequencing technologies include single molecule optical detection methods, e.g., as used in technologies developed by PacBio; optical (clonal) methods, e.g., as used in technologies developed by Illumina; and fluorescently labeled nucleotide based methods (including those that use photodeprotection), e.g., as used in technology developed by Lasergen. Such methods have varying degrees of advantages and disadvantages, but the significant challenge up until now has remained the issue of conducting such sequencing analyses with ultra-low cost instrumentation systems with truly low cost and disposable reagents.
The concept of DNA sequencing-by-synthesis (SBS) was revealed in 1988 with an attempt to sequence DNA by detecting the pyrophosphate group that is generated when a nucleotide is incorporated by a DNA polymerase reaction (Hyman (1999) Anal. Biochem. 174: 423-436). Subsequent SBS technologies were based on additional ways to detect the incorporation of a nucleotide to a growing DNA strand. In general, conventional SBS uses an oligonucleotide primer designed to anneal to a predetermined position of the sample template molecule to be sequenced. The primer-template complex is presented with a nucleotide in the presence of a polymerase enzyme. If the nucleotide is complementary to the position on the sample template molecule that is directly 3′ of the end of the oligonucleotide primer, then the DNA polymerase will extend the primer with the nucleotide. The incorporation of the nucleotide and the identity of the inserted nucleotide can then be detected by, e.g., the emission of light, a change in fluorescence, a change in pH (see, e.g., U.S. Pat. No. 7,932,034), a change in enzyme conformation, or some other physical or chemical change in the reaction (see, e.g., WO 1993/023564 and WO 1989/009283; Seo et al. (2005) “Four-color DNA sequencing by synthesis on a chip using photocleavable fluorescent nucleotides,” PNAS 102: 5926-59). Upon each successful incorporation of a nucleotide, a signal is detected that reflects the occurrence, identity, and number of nucleotide incorporations. Unincorporated nucleotides can then be removed (e.g., by chemical degradation or by washing) and the next position in the primer-template can be queried with another nucleotide species.
While it has become apparent that next-generation sequencing has broad application to diagnostics including cancer, infectious diseases, companion drugs, and hereditary diseases, the extant next-generation sequencing systems are designed to sequence whole genomes and therefore the systems a have high cost per test (e.g., approximately $100 to $500 per test). Moreover, implementing the current commercial systems is also expensive (e.g., $75,000 to $700,000) and the sample-to-sequence work flow is laborious. As such, the extant technologies do not provide a sample to sequence system that is desirable for diagnostic applications.
As such, it is a goal to generate high quality data at a reasonable cost and deliver next-generation sequencing data accurately and rapidly in an easy to use system. Companies such as PacBio have developed specific chemistries for implementation on their systems. At the same time, other companies such as VisiGen and Life Technologies have pursued alternative chemistries for addressing low cost sequencing.
In particular, LaserGen has developed approaches using optical detection systems and certain reaction chemistries to produce and polymerize photo-deprotectable nucleotides that could be employed in next generation sequencing applications, e.g., as described in U.S. Pat. Nos. 7,893,227; 7,897,737; 7,964,352; and 8,148,503. The LaserGen nucleotides have a photocleavable, fluorescent terminator moiety attached to the nucleotide base and a non-Mocked 3′ hydroxyl on the ribose sugar. The photocleavable, fluorescent terminator provides a substrate for polymerization, e.g., a polymerase adds the nucleotide analog to the 3′ hydoxyl of the synthesized strand. While attached to the nucleotide at the 3′ end, the photocleavable, fluorescent terminator prevents additional nucleotide addition by the polymerase. Also, the fluorescent moiety provides for identification of the nucleotide added using an excitation light source and a fluorescence emission detector. Upon exposure to a light source of the appropriate wavelength, the light cleaves the photocleavable, fluorescent terminator from the 3′ end of the strand, thus removing the block to synthesis and another nucleotide analog is added to begin the cycle again. When used in a sequencing-by-synthesis reaction, the LaserGen fluorescently labeled nucleotide compounds offer a way to photodeprotect and at the same time allow for extension, e.g., by sterically unblocking the region in the enzyme so as to permit extension. However, these compounds suffer from having to use fluorescence to detect the presence or absence of a particular incorporation event. As such, the need to use optical detection in the context of utilizing optical cleavage of photodeprotectible labels in these technologies is prohibitive to achieving the lowest optimal cost in creating a low-cost sequencing system. The system requires careful coordination of the excitation and deprotecting light sources so that excitation does not deprotect free or incorporated nucleotides. Accordingly, extant methods and previous ideas are inadequate for achieving the desired optimal cost for sequencing.