The modern era of automated, high-throughput nucleic acid sequencing began with large-scale application of the Sanger sequencing method for The Human Genome Project. The Sanger method used chain-terminating inhibitors, one for each nucleotide—C, T, A, and G—each of which contained a detectable label and which, when incorporated into a nucleotide strand, inhibited further progress of the polymerase enzyme used in the method. (See, Sanger et al., “DNA sequencing with chain-terminating inhibitors,” Proc. Natl. Acad. Sci. USA, 74:5463-5467, 1977). These prematurely terminated sequences, which were complementary to the target sequence, were then separated and detected by gel electrophoresis. This method is still used today in basic and applied scientific research. The Sanger method has also been automated, providing a technology that some refer to as the “first generation” of polynucleotide sequencing. (Metzker, Michael A., “Sequencing technologies—the next generation,” Nature Rev. Gen., 11:31-46, 2010.) However, science has marched on since disclosure of the Sanger method and today, more than three decades later, additional strategies available for determining the sequence of a polynucleotide have been developed.
The new methods, commonly referred to as Next Generation Sequencing (NGS) or “massively parallel” methods, coupled with improvements in methods involving the polymerase chain reaction (PCR) for amplifying target polynucleotides, have catapulted the study of nucleotide sequences into increasingly more diverse areas of application. Consequently, the market for sequencing has been estimated to be over one billion USD in 2011, and is expected to double by 2016. (See, “Research and Markets: Next Generation Sequencing: Market Size, Segmentation, Growth and Trends by Provider 2011,” Business Wire, Nov. 30, 2011).
Whereas the Human Genome Project cost over three billion US dollars and required nearly thirteen years to complete, today a whole genome may be sequenced using NGS technology in 24 hours at a fraction of the cost. Over the first decade of the 21st century, technology innovation in NGS progressed towards a long-sought “$1000 genome.” (See, Wolinsky, Howard, “The thousand-dollar genome. Genetic brinkmanship or personalized medicine?” EMBO Rep., 8(10):900-903, 2007). In 2007, the genomic sequence of James D. Watson was obtained using NGS technologies at a cost of approximately one million US dollars. Dr. Watson's genetic sequence was published in 2008. (See, Wheeler et al., “The complete genome of an individual by massively parallel DNA sequencing,” Nature 452(7189): 872-876, 2008). Later in 2007, an individual could purportedly obtain their genomic sequence for the price of 350,000 US dollars. (See, Amy Hannon, “Gene map becomes a luxury item,” New York Times, Mar. 4, 2008). The $50,000 human genome was first offered around 2010. (See, Dewey et al., “Phased Whole-Genome Genetic Risk in a Family Quartet Using a Major Allele Reference Sequence,” PLoS Genetics 7(9): e1002280, 2011). In 2012, the ability to sequence an entire human genome within one day for a cost of approximately $1,000 was advertised. (See, Defrancesco, L., “Life Technologies promises $1,000 genome,” Nature Biotechnology 30(2): 126, 2012). Instead of striving for obtaining the sequence information for a single human genome, scientists are now striving towards obtaining the sequence information of 1,000 genomes.
These new high throughput NGS (HT-NGS) methods may potentially allow scientists to obtain the sequence of genes more quickly and at less cost. (See, Smith, Caitlin, “Whole Genome Sequencing Technologies Enhance Speed and Throughput,” Biocompare, Apr. 25, 2013 and Mardis, E R, “A decade's perspective on DNA sequencing technology,” Nature, 470(7333):198-203, 2011). However, it is recognized that the efficiency of HT-NGS is sometimes obtained at the cost of accuracy and fidelity. Error rates continue to be a concern in employing these methods, especially in the clinical fields. Thus, maintaining low cost profiles while increasing sensitivity are all matters that continue to receive much attention and continued innovation. Longer reads, i.e. longer continuous sequence determinations on longer strands of nucleic acid, offers higher fidelity, but are technically more difficult to achieve and sometimes require more time and resources to obtain. Shorter reads, i.e. obtaining the sequence information from a shorter nucleotide, are easier and can be performed in massively parallel systems, offering higher fidelity than might otherwise be expected, but may be less useful in a clinical setting.
The issue of accuracy in NGS may be addressed by ensuring that sequences are not determined in a low fidelity environment. For instance, in the context of sequencing by synthesis (SBS), polymerase enzymes are used to determine the identity of the next base needed in the growing strand and to catalyze its incorporation. Non-native nucleotides are often utilized in SBS methodologies to arrest the progression of strand synthesis, allowing determination of the identity of the incorporated base. However, the fidelity of polymerase enzymes drops when tasked with incorporating non-native nucleotides into the growing DNA strand. There is a long felt need in the field of HT-NGS to increase fidelity in SBS and to develop nucleotides for use in sequential reversible termination which meet all assay requirements, providing efficient, quantitative termination and reversibility, acceptable accuracy, avoids harsh chemical conditions, and does not slow down polymerase activity. The present application provides such non-native nucleotides useful in SBS methodologies.