1. Technical Field of the Invention
This invention relates to the preparation and use of activating groups for enhancing the reactivity between fluorescent dyes and nucleic acid chain terminators, for use in nucleic acid sequencing methods.
2. Prior Art
The present Invention relates to novel families of compound for use in nucleic acid sequencing. More specifically, the present Invention discloses and claims compounds (as well as their method of synthesis) for use as activating groups. More specifically, these compounds act as linking agents to allow the attachment of fluorescent dyes to certain derivatized nucleotides, and thus enabling the investigator to determine the sequence of the bases on the nucleic acid chain of interest. What follows is a brief description of a typical experimental setting for the novel compounds of the present Invention to provide context and to assess the advantages of the present Invention over the prior art. Finally, the compounds used in the prior art shall be discussed.
The most fundamental information regarding a particular nucleic acid isolate is its sequence--that is, the nucleotide bases that make up the isolate and the precise order of these bases on the molecule. For example, DNA is a large molecule, often presented visually in the shape of a ladder. This ladder is actually comprised of two mirror image pieces, each piece comprising a single vertical brace to which the rungs are attached, and one half of each rung. Thus the "half rungs" of each piece or strand fit together to form a complete ladder. The rungs of the ladder correspond to nitrogenous bases; in the case of DNA, there are only four types: adenine (A), guanine (G), cytosine (C), and thymine (T). Each rung therefore consists of two bases. Moreover, the two bases comprising each rung can only be certain complementary pairs: A can only fit with T, and G can only fit with C; thus, a rung can be A-T, G-C, T-A, or C-G, but not T-G, for instance. Significantly, this means that if the sequence of one piece of the ladder is known, then the other sequence can be readily determined since it can be inferred from the requiring pairings. This latter strand is known as the "complementary strand." In summary, "sequencing" a DNA strand means determining the identity and order of its rungs (or actually half rungs), e.g., A-T-G-C-A, etc. Knowledge of this sequence allows investigators to determine the proteins encoded by that sequence and to compare that sequence with others of known function.
At present, several nucleic acid sequencing methods are widely used. One is the dideoxy chain termination method, discussed in Sanger et al., Proc. Natl. Acad. Sci. 74:5463-67 (1977). This reference is hereby incorporated by reference into the present Application. Another is the chemical degradation method, discussed in Maxam et al., Proc. Natl. Acad. Sci. 74:560-564 (1977); a third group is the hybridization methods, discussed in Drmanac et al., Genomics 4:114-28, and Khrapko, FEB 256:118-22 (1989). Of these, the dideoxy chain termination method disclosed in Sanger et al. is the one to which the present Invention is primarily, though not exclusively directed.
Briefly, the Sanger et al. method involves isolating a single strand of the nucleic acid of interest, which will be used as a template for primed synthesis of a complementary strand. It is this complementary strand whose sequence will be determined; the sequence of the original strand can then be inferred. To determine a sequence of a DNA strand using the Sanger et al. method, the complementary strand is constructed using the original strand as a template. To construct a complementary strand, four different reaction vessels are used--which correspond to the four types of bases comprising DNA--A, T, G, and C. To the first reaction vessel, the original strand is added (which acts as the template upon which synthesis of the complementary strand occurs) along with DNA polymerase, which catalyzes the synthesis of the complementary strand; and finally, the bases (A, T, G, and C) are added which are the building blocks for the new strand. Some of one group of bases (either of the four) per reaction vessel is derivatized; that is, it is chemically altered such that no further synthesis of the strand can occur after that base is added to the chain of bases comprising the complementary strand. Thus, in the first reaction vessel, some of the "A" bases will be derivatized so that chain synthesis stops (i.e., no further synthesis can occur) upon incorporation of that derivatized base into the DNA strand. This derivatized base is called a "chain terminator." In the reaction vessel, millions of DNA complementary strands are being synthesized, since both normal "A" and the derivatized version of "A" (i.e., the chain terminator) are added to the reaction vessel; some strands will incorporate normal "A," others the chain terminator, still others will incorporate the chain terminator at the second occurrence of A on the chain (i.e., where the original strand as a "T"), still others the third, and so forth. Thus, after some time, the reaction vessel will contain a mixture of partial complementary strands of various chain lengths, depending upon at what point in the sequential synthesis they were terminated. They all have one common feature: they all stop at an "A" caused by incorporation of the chain terminator into the complementary strand. Therefore, the investigator can in theory determine every point on a hypothetical complete complementary strand at which an "A" occurs. This is true because if there are millions of partially synthesized complementary strands in the vessel, one would expect--statistically--that there would be found in that vessel at least one partially synthesized chain (that must terminate at "A") corresponding to every occurrence of an "A" in the hypothetical completed complementary strand. Thus, if the hypothetical completed complementary strand had 100 "A" bases in it, then one would expect 100 partially synthesized complementary strands of different lengths, since the synthesis of each one would have stopped upon incorporation of a chain-terminating "A" (though not by incorporating a normal "A"). Obviously, if the same method is performed in three other reaction vessels for T, G, and C, then the entire sequence of the complementary strand can be determined.
What is needed therefore is some means to "mark" or "tag" the chain terminators so that they can be identified. This allows the investigator to determine precisely when the chain stopped; this information is valuable because the investigator now knows what base is present at the point of termination (in our prior example, it would be "A"). One means that has proven quite successful is to attach cyanine dyes to the chain terminators. These dyes fluoresce when exposed to ultraviolet light, thus signaling their presence. For reasons which shall be discussed below, affixing the dyes to the chain terminators is problematic. Thus, the present Invention is directed to a family of compounds that enables the dye and chain terminator to be readily affixed.
The process of affixing the dye to the chain terminator typically involves derivatizing the dye, or "activating" the dye. Compounds, such as those disclosed in the present Invention which activate the dye, are typically referred to as "activating groups." Presently, para-nitrophenol (PNP) and N-hydroxysuccinimide (NHS or Osu) are used as activating groups for cyanine dyes. Thus, these compounds allow the coupling reaction between dideoxynucleotides-amine (ddNTP-NH2) and the cyanine dye, which results in the desired dye-labeled terminator. These activating groups of the prior art (PNP and NHS) have numerous shortcomings though. First, the coupling of dye-PNP with ddNTP gives a lower yield than that of dye-NHS. Second, and most significantly, both dye-PNP and dye-NHS give a mixture of two major products, namely the mono-substituted product and di-substituted product (i.e., two chain terminators per dye molecule). This is undesirable. Consequently, a large excess of activated dye is needed for the coupling reaction in order to favor the formation of the mono-substituted product. Finally, the prior art activating groups give relatively low yields of the desired product.
The present Invention is directed to a novel family of activating groups designed to provide greater selectivity for the mono-substituted product over the di-substituted. The activating groups of the present Invention also provide greater yields of the desired end product. This family of compounds are based on the N-hydroxyphthalimide (BOSu) structure.