The ability to sequence and re-sequence (a term that describes the sequencing of a new genome while making reference to the genome of a closely related organism, generally of the same species) deoxyribonucleic acid (DNA) has the potential for revolutionizing biology and medicine. The ability to re-sequence segments of the genome of individual humans will enable the personalization of medicine, as the genetic differences between individuals carries information about how those individuals will respond differently to similar treatments [Ros00].
Most DNA sequencing is done today using capillary array DNA sequencers that detect fluorescent dyes appended to the 5- or 7-positions of pyrimidine or 7-deazapurine nucleobases attached to dideoxynucleotide analogs [Smi86][Ju95][Ju96][Khe96][Sa198]. These analogs, present as a small fraction of the total nucleotide triphosphates, stochastically and irreversibly terminate an elongating DNA chain, because they lack a 3′-OH group. Mutant polymerases have improved the uniformity and efficiency of termination, improving the quality of sequencing data [Tab87][Tab95].
While this sequencing strategy has created the “post-genomic world”, it has well known limitations. Primary among them is that it is difficult to multiplex; each sequence must be determined separately on a separate capillary. Further, it is not exquisitely sensitive; it cannot determine the sequence of a small number of molecules, and is insufficiently sensitive to sequence a single molecule of DNA. Further, the irreversibly terminated elongating DNA strands cannot be cloned. Further, the irreversible termination does not introduce a moiety into the oligonucleotide that can be later used to recover the product strand.
In part to enhance multiplexing, in part for other reasons, sequencing by synthesis without using electrophoresis was proposed as a strategy in 1988 [Hym88]. Generically, the strategy involves detecting the identity of each nucleotide at the same time as it is incorporated into the growing strand of DNA in a polymerase-catalyzed reaction. A variety of architectures have been proposed for performing “sequencing by synthesis” [Che94][Met94]. These differ in the way that the nature of the nucleotide that was just incorporated in each step of the synthesis is determined. They also differ by the tactic used to prevent the addition of the following nucleotide until the identity of the nucleotide that was just incorporated had been determined.
For example, in the pyrosequencing architecture [Ron98], a “minus” strategy is used to look at single nucleotide incorporations. Here, only one of the four natural nucleoside triphosphates is incubated in the reaction at any one time. Detection is based on the release of pyrophosphate during the DNA polymerase reaction, indicating the addition to the elongating chain of the added triphosphate, or the absence of the release of pyrophosphate. The pyrophosphate is detected through its conversion to adenosine triphosphate (ATP) by sulfurylase, which then generates visible light in the presence of firefly luciferase.
The limitations of this procedure are also well known in the art. First, the amount of pyrophosphate must be quantitated to distinguish between the addition of a single nucleotide of the type added, or of several in a “homosequence run”. While this is readily done for runs of one, two, or three nucleotides, it becomes progressively more difficult as the runs become longer. Further, each of the four nucleoside triphosphates must be added separately. Polymerases are well known to misincorporate when they are not presented with the complementing triphosphate. This creates undesired termination, in many cases, or “ragged ends” in others.
Another architecture uses a polymerase to direct the incorporation, in a template-directed polymerization step, of a nucleoside triphosphate or thiotriphosphate (which is useful in certain architectures) having its 3′-hydroxyl group blocked by a removable protecting (or blocking) group. This blocking group prevents the polymerase from adding additional nucleotides until the blocking group is removed. In practice, this provides an arbitrarily long time to determine the nature of the added nucleotide.
A frequent proposal for this architecture is to place different distinctive tags on the four nucleobases. These tags may be distinctively colored fluorescent groups, although other tags have been proposed. Then, after the blocked nucleotide is incorporated, the nature of the nucleotide incorporated is determined by reading the fluorescence that comes from the tag. After this is done, the 3′-protecting group is removed to generate a 3′-OH group at the 3′-end of the elongating primer, the tag is removed, and the next cycle of sequencing is initiated. In this architecture, template-directed polymerization is done using a DNA polymerase or, conceivably, a reverse transcriptase [Mit03].
When the output is fluorescence, this implementation of the strategy requires:    (a) Four analogues of dATP, dTTP, dGTP, and dCTP, each carrying a fluorescent dye with a different color, with the 3′-end blocked so that elongation is not possible.    (b) The four analogues must be efficiently incorporated, to allow the elongation reaction to be completed before undesired reactions occur, and to avoid ragged ends arising from incomplete incorporation. For single molecule sequencing, failure to incorporate is still undesirable, as a cycle of sequence collection is missed.    (c) The incorporation must be faithful. Mismatched incorporation, if not corrected by proofreading, will lead to the loss of strands if the polymerase does not extend efficiently a terminal mismatch. This will gradually erode the intensity of the signal, and may generate “out of phase” signals that confuse the reading of the output downstream. Large numbers of errors will, of course, confuse the primary signal. For single molecule sequencing, misincorporation may well mean the end of a read.    (d) The dye and the group blocking the 3′-OH group need to be removed with high yield to allow the incorporation of the next nucleotide of the next nucleotide to proceed. Less than 99% completion for each cycle (and incompletion) will gradually erode the intensity of the signal, and may generate “out of phase” signals that confuse the reading of the output downstream. For single molecule sequencing, failure to cleave the 3′-OH blocking group may not create a decisive error, but it can lose a cycle of sequence data collection.    (e) The growing strand of DNA should survive the washing, detecting and cleaving processes. While reannealing is possible, we preferably would like conditions that allow the DNA primer and template to remain annealed.
It their most ambitious forms, sequencing-by-synthesis architectures would use the same nucleoside modification to block the 3′-end of the DNA and to introduce the fluorescent tag [Wel99]. For example, if a fluorescent tag is attached to the 3′-position via an ester linkage, replacing the hydrogen atom of the 3′-OH group of the nucleoside triphosphate, extension following incorporation would not be possible (there is no free 3′-OH group). This would give time to read the color of the fluorescent label, determining the nature of the nucleotide added. Then, the 3′-O acyl group could be removed by treatment with a mild nucleophile (such as hydroxylamine) under mild conditions (pH<10) to regenerate a free 3′-hydroxyl group, preparing the DNA for the next cycle.
The difficulty in implementing this elegant approach is the polymerases themselves. Any tag that fluoresces in a useful region of the electromagnetic spectrum must be large, on the order of 1 nm. Crystal structures of polymerases show that the 3′-position in the deoxyribose unit is close to amino acid residues in the active site of the polymerase, and do not offer the incoming triphosphate the space to accommodate a tag of that size. The structure of the ternary complexes of rat DNA polymerase beta, a DNA template-primer, and dideoxycytidine triphosphate (ddCTP) from the Kraut laboratory, as well as a variety of structures for other polymerases from other sources solved in other laboratories, illustrates this fact. The polymerase, therefore, is not likely to be able to handle substituents having a tag of this size at the 3′-position. Indeed, polymerases do not work well with any modification of the 3′-OH group of the incoming triphosphate. For example, to accept even 2′,3′-dideoxynucleoside analogues (where the 3′-moiety is smaller than in the natural nucleoside), mutated polymerases are often beneficial.
Ju et al., in U.S. Pat. No. 6,664,079, noted these problems as they outlined a proposal for sequencing by synthesis based on 3′-OH blocking groups. Therefore, they argued that the prior art had not been enabled, even though it specified many details of an architecture for sequencing by synthesis. They suggested that this problem might be addressed using nucleotide analogues where the tag, such as a fluorescent dye or a mass tag, is linked through a cleavable linker to the nucleotide base or an analogue of the nucleotide base, such as to the 5-position of the pyrimidines (T and C) and to the 7-position of the purines (G and A). Bulky substituents are known to be accepted at this position; indeed, these are the sites that carry the fluorescent tags in classical dideoxy sequencing. According to Ju et al., tags at this position should, in principle, allow the 3′-OH group to be blocked by a cleavable moiety that is small enough to be accepted by DNA polymerases. In this architecture, multiple cleavage steps might be required to remove both the tag (to make the system clean for the addition of the next tag) and the 3′-blocking group, to permit the next cycle of extension to occur [Mit03][Seo04].
U.S. Pat. No. 6,664,079 then struggled to find a small chemical group that might be accepted by polymerases, and could be removed under conditions that were not so harsh as to destroy the DNA being sequenced or the architecture supporting the sequences. U.S. Pat. No. 6,664,079 cited a literature report that 3′-O-methoxy-deoxynucleotides are good substrates for several polymerases [Axe78]. It noted, correctly, that the conditions for removing a 3′-O methyl group were too stringent to permit this blocking group from being removed under any conditions that were likely to leave the DNA being sequenced, or the primer that was being used, largely intact.
An ester group was also discussed as a way to cap the 3′-OH group of the nucleotide. U.S. Pat. No. 6,664,079 discarded this blocking group based on a report that esters are cleaved in the active site in DNA polymerase [Can95]. It should be noted that this report is questionable, and considers only a single polymerase. Therefore, in a modification not considered by Ju et al. a formyl group may be used in this architecture. The 3′-O formylated 2′-deoxynucleoside triphosphates are preparable as intermediates in the Ludwig-Eckstein triphosphate synthesis, if the 3′-O acetyl group that is traditionally used is replaced by a formyl group, and the final alkaline deprotection step is omitted.
U.S. Pat. No. 6,664,079 then cited a literature report that 3′-O-allyl-dATP is incorporated by Vent (exo-) DNA polymerase in the growing strand of DNA [Met94]. U.S. Pat. No. 6,664,079 noted that this group, and the methoxymethyl MOM group, having a similar size, might be used to cap the 3′-OH group in a sequencing-during-synthesis format. This patent noted that these groups can be cleaved chemically using transition metal reagents [Ire86][Kam99], or through acidic reagents (for the MOM group).
These suggestions therefore define the invention proposed in U.S. Pat. No. 6,664,079. Briefly, the essence of that invention is an architecture where the triphosphates of four nucleotide analogues, each labeled with a distinctive cleavable tag attached to the nucleobase, and each having the hydrogen of the 3′-OH group capped replaced by an allyl group or a MOM group, are used as the triphosphates in the sequencing by synthesis architecture, and the products are oligonucleotides prepared by polymerase incorporation that have this replacement.
This architecture, to date, has never been reduced to practice. This is again because of the polymerases. While the allyl group is small, to date, no polymerases have been shown to incorporate these to the extent and with the efficiency needed to effectively reduce this invention to practice. Therefore, U.S. Pat. No. 6,664,079 cannot be said to have enabled the sequencing-by-synthesis strategy. Further, more recent literature has described the use of Therminator variants to incorporate these MOM- and allyl-protected nucleoside triphosphates. Therminator has many disadvantages that make it difficult to apply in practice. Not the least of these is the affinity with which it binds to template-primer overhangs and single stranded DNA, an affinity that makes it difficult to wash in repetitive sequencing-during-synthesis architectures.
Recent patent applications have disclosed a smaller 3′-blocking group, one that has fewer than three heavy (that is, non-hydrogen) atoms. Their disclosures taught that such a blocking group is useful for an efficient sequencing-during-synthesis architecture, either with natural polymerases or with polymerases in which one of the amino acids in contact with the ribose ring is mutated. These might include the formyl unit, where the hydrogen atom of the 3′-OH group is replaced by a COH unit.
In the disclosed invention, the preferred replacement is NH2 or NHR. There, the 3′-O-amino group is used as a removable protecting group for the sequencing-by-synthesis scheme. The 3′-O-amino group is chosen is as small a moiety that forms a stable 3′-O blocking group. The small size of the 3′-modification makes it most likely to be accepted by DNA polymerases during template-directed DNA polymerization [Hen04].
Further, contact by DNA polymerases with the 3′-end of the incoming triphosphate is frequently made with an amino acid with an aromatic side chain (Phe or Tyr) [Gar99]. The size of this can be reduced (to His), generating the possibility that if any particular natural polymerase does not work, then these can be mutated, followed by a round of in vitro directed evolution [Gha01], to generate polymerases that accept 3′-O-amino triphosphates with acceptable specifications.
The hydroxylamine functionality is stable in water, and displays several other advantages:    (a) 3′-O-Amino-2′-deoxynucleosides [DeC90][Kon85][Bur94][Coo94] are directly synthesizable from the xylo-2′-deoxyribonucleosides via a Mitsunobu reaction with N-hydroxyphthalimide.    (b) The 3′-O-amino-2′-deoxynucleoside blocking group is small, even smaller than the speculative —OSH unit (which is considered in the instant invention) and the azido unit (which is incorporated by reverse transcriptases when they accept azidothymidine triphosphate, for example).    (c) The 3′-O-amino-2′-deoxynucleoside functionality has much of the hydrogen bonding potential of the 3′-OH group. While not wishing to be bound by theory, these derivatives may form a network of hydrogen bonds to the catalytic magnesium ion, as suggested by crystallography for the natural substrate, and therefore fitting into the active site of various polymerases.    (d) In some cases, a polymerase can be improved by replacing the Phe or Tyr (depending on the polymerase) [Eva00][Gar99] that blocks the 3′-position of the incoming triphosphate with a slightly smaller aromatic and/or hydrophobic group, His/Phe/Val.    (e) A large number of reagents are known that cleave the N—O linkage in hydroxylamines and O-alkoxyamines. These are discussed in greater detail below. Oxidative conditions are provided by bleach, nitrous acid at pH 6 under conditions where the nucleobases are not significantly modified, nitroso compounds, iodate, or potassium ferrate in 1 M NaCl, 50 mM potassium phosphate buffer, 25° C.; this generates the free —OH group and N2O, which is trapped. Reducing agents include catalytic hydrogenation. The preferred approaches include addition-elimination cycles where the amino group of the alkoxyamine adds to an electrophile (such as maleimide or a naphthoquinone) and then ejects the alcohol as a leaving group.    (f) Once incorporated, the product 3′-O-amino-oligo-2′-deoxyribonucleosides themselves have value, through capture architectures that exploit the 3′-blocking group or, after its removal, as the starting point for cloning and further elongation processes.
With this 3′-O blocking group, other features of the architecture of the state-of-the-art sequencing-by-synthesis approach can be adopted. In particular, the linkers that hold the fluorescent labels to the nucleobases in the Ju architecture might be cleaved using the same reagent is used to remove the amino group from the terminal 3′-O-amino-2′-deoxynucleoside.