The ability to sequence and re-sequence (a term that describes the sequencing of a new genome while making reference to the genome of a closely related organism, generally of the same species) deoxyribonucleic acid (DNA) has the potential for revolutionizing biology and medicine. The ability to re-sequence segments of the genome of individual humans will enable the personalization of medicine, as the genetic differences between individuals carries information about how those individuals will respond differently to similar treatments [Ros00]. Most DNA sequencing is done today using capillary array DNA sequencers that detect fluorescent dyes appended to the 5- or 7-positions of pyrimidine or 7-deazapurine nucleobases attached to dideoxynucleotide analogs [Smi86][Ju95][Ju96][Khe96][Sal98]. These analogs, present as a fraction of the total nucleotide triphosphates, irreversibly terminate a growing DNA chain. Mutant polymerases have improved the uniformity and efficiency of termination, improving the quality of sequencing data [Tab87][Tab95].
While these strategies have created the post-genomic world, they have well known limitations. Primary among them is that they are difficult to multiplex, and a difficulty to do sequencing on small numbers of molecules, including sequencing on single molecules.
In part to enhance multiplexing, in part for other reasons, sequencing by synthesis without using electrophoresis was introduced as a strategy in 1988 [Hym88]. This approach involves detecting the identity of each nucleotide as it is incorporated into the growing strand of DNA in a polymerase-catalyzed reaction. Such a strategy, coupled with the chip format and laser-induced fluorescent detection, was proposed to have the potential of increasing the throughput of DNA sequencing, largely due to multiplexed sequencing. These approaches are referred to herein as “sequencing by synthesis”, even though the term is somewhat inappropriate. Synthesis is, of course, involved in classical DNA sequencing, meaning that these approaches are perhaps better denoted as sequencing with analysis concurrent with synthesis.
A variety of architectures have been proposed for sequencing by synthesis [Che94] [Met94]. The pyrosequencing architecture employs four natural nucleotides (comprising a base of adenine (A), cytosine (C), guanine (G), or thymine (T)) and several other enzymes for sequencing DNA by synthesis [Ron98]. In this architecture, detection is based on the release of pyrophosphate during the DNA polymerase reaction. This is converted to adenosine triphosphate (ATP) by sulfurylase, which then generates visible light in the presence of firefly luciferase. The limitations of this procedure are also well known in the art. Each of the four nucleotides must be added and detected separately. The procedure is not likely to ever be able to sequence long runs.
Another architecture has a polymerase direct the incorporation, in a template-directed polymerization step, of a nucleoside triphosphate or thiotriphosphate (which is useful in certain architectures) having its 3′-hydroxyl group blocked by a removable protecting (or capping) group, which carries one of four tags, distinctive for the four nucleobases. fluorescent group. Then, after the blocked nucleotide is incorporated and the nature of the nucleotide incorporated is determined by reading the tag, the 3′-protecting group is removed to generate a 3′-OH group at the 3′-end of the elongating primer. This permits the next cycle of sequencing to occur.
In this architecture, template-directed polymerization is done using a DNA polymerase or, conceivably, a reverse transcriptase [Mit03]. The identity of each nucleotide is determined as it is incorporated. The most common proposal to do this requires that incorporated nucleotide carry a fluorescent tag. When the output is fluorescence, the strategy requires:    (a) Four analogues of dATP, dTTP, dGTP, and dCTP, that each carry a fluorescent dye with a different color, with the 3′-end blocked so that elongation is not possible.    (b) The four analogues must be efficiently incorporated, to permit the elongation reaction to be completed before undesired reactions occur, and to avoid ragged ends arising from incomplete incorporation. For single molecule sequencing, this is less critical, but still undesirable, as a cycle of sequence collection is missed.    (c) The incorporation must be faithful. Mismatched incorporation, if not corrected by proofreading, will lead to the loss of strands if the polymerase does not extend efficiently a terminal mismatch. This will gradually erode the intensity of the signal, and may generate “out of phase” signals that confuse the reading of the output downstream. Large numbers of errors will, of course, confuse the primary signal. For single molecule sequencing, misincorporation may well mean the end of a read.    (d) The dye and the group capping the 3′-OH group need to be removed with high yield to allow the incorporation of the next nucleotide of the next nucleotide to proceed. Less than 99% completion for each cycle (and incompletion) will gradually erode the intensity of the signal, and may generate “out of phase” signals that confuse the reading of the output downstream. For single molecule sequencing, failure to cleave the 3′-OH blocking group may not create a decisive error, but it can lose a cycle of sequence data collection.    (e) The growing strand of DNA should survive the washing, detecting and cleaving processes. While reannealing is possible, we preferably would like conditions that allow the DNA primer and template to remain annealed.
It their most ambitious forms, sequencing-by-synthesis architectures were proposed that used the same nucleoside modification to block the 3′-end of the DNA and to introduce the fluorescent tag [Wel99]. The chemistry of this architecture is simple to envision. For example, if the fluorescent tag is attached to the 3′-position via an ester linkage, extension following incorporation would not be possible (there is no free 3′-OH group). This would give time to read the color of the fluorescent label, determining the nature of the nucleotide added. Then, the 3′-O acyl group could be removed by treatment with a mild nucleophile (such as hydroxylamine) under mild conditions (pH<10) to regenerate a free 3′-hydroxyl group, preparing the DNA for the next cycle.
The difficulty in implementing this elegant approach is the polymerases themselves. Crystal structures of polymerases show that the 3′-position in the deoxyribose unit is close to amino acid residues in the active site of the polymerase. The structure of the ternary complexes of rat DNA polymerase beta, a DNA template-primer, and dideoxycytidine triphosphate (ddCTP) from the Kraut laboratory, as well as a variety of structures for other polymerases from other sources solved in other laboratories, illustrates this fact. The polymerase, therefore, is not likely to be able to handle large substituents at the 3′-position. For example, to accept even 2′.3.-dideoxynucleoside analogues, where the substituents at this position is smaller, mutated polymerases are often beneficial.
Ju et al, in U.S. Pat. No. 6,664,079 noted these problems as they outlined a proposal for sequencing by synthesis based on 3′-OH blocking groups. They suggested that this problem might be addressed using nucleotide analogues where the tag, such as a fluorescent dye or a mass tag, is linked through a cleavable linker to the nucleotide base or an analogue of the nucleotide base, such as to the 5-position of the pyrimidines (T and C) and to the 7-position of the purines (G and A). Bulky substituents are known to be accepted at this position; indeed, these are the sites that carry the tags in classical dideoxy sequencing. Tags at this position should, in principle, allow the 3′-OH group to be blocked by a cleavable moiety that is small enough to be accepted by DNA polymerases. In this architecture, cleavage steps would be required to remove both the tag (to make the system clean for the addition of the next tag) and the 3′-blocking group, to permit the next cycle of extension to occur [Mit03][Seo04].
U.S. Pat. No. 6,664,079 then struggled to find a small chemical group that might be accepted by polymerases, and could be removed under conditions that were not so harsh as to destroy the DNA being sequences. U.S. Pat. No. 6,664,079 cited a literature report that 3′-O-methoxy-deoxynucleotides are good substrates for several polymerases [Axe78]. It noted, correctly, that the conditions for removing a 3′-O methyl group were too stringent to permit this blocking group from being removed under any conditions that were likely to leave the DNA being sequenced intact.
An ester group was also discussed as a way to cap the 3′-OH group of the nucleotide. U.S. Pat. No. 6,664,079 discarded this capping group based on a report that esters are cleaved in the active site in DNA polymerase [Can95]. It should be noted that to this report is questionable, and considers only a single polymerase. Therefore, the instant disclosure teaches that is possible that a formyl group could be used in this architecture. 3′-O formylated 2′-deoxynucleoside triphosphates are preparable as intermediates in the Ludwig-Eckstein triphosphate synthesis, if the 3′-O acetyl group that is traditionally used is replaced by a formyl group, and the final alkaline deprotection step is omitted.
Chemical groups with electrophiles such as ketone groups were also considered and discarded by U.S. Pat. No. 6,664,079, as not being suitable for protecting the 3′-OH of the nucleotide in enzymatic reactions due to the existence of strong nucleophiles (such as amino groups) in the polymerase. It should be noted that the 3′-keto 2′-deoxyribose unit is not stable to decomposition via beta elimination reactions, as is well known in the literature studying the mechanism of ribonucleotide reductases.
U.S. Pat. No. 6,664,079 then cited a literature report that 3′-O-allyl-dATP is incorporated by Vent (exo-) DNA polymerase in the growing strand of DNA [Met94]. U.S. Pat. No. 6,664,079 noted that this group, and the methoxymethyl MOM group, having a similar size, might be used to cap the 3′-OH group in a sequencing-by-synthesis format. This patent noted that these groups can be cleaved chemically using transition metal reagents [Ire86][Kam99].
These suggestions therefore define the invention disclosed in U.S. Pat. No. 6,664,079. Briefly, the essence of this invention is an architecture where the triphosphates of four nucleotide analogues, each labeled with a unique cleavable tag attached to the nucleobase, and each having the 3′-OH unit capped with an allyl group (the MOM group not having high utility in this context), are used as the extension groups in the sequencing by synthesis strategy.
This architecture, to date, has never been reduced to practice to give a practical process for sequencing DNA by synthesis. This is again because of the polymerases. While the allyl group is small, to date, no polymerases have been shown to incorporate these to the extent and with the efficiency needed to effectively reduce this invention to practice. Therefore, U.S. Pat. No. 6,664,079 cannot be said to have enabled the sequencing-by-synthesis strategy.