DNA sequencing is one of the cornerstone analytical techniques of modern molecular biology. The development of reliable methods for sequencing has led to great advances in the understanding of the organization of genetic information and has made possible the manipulation of genetic material (i.e. genetic engineering).
There are currently two general methods for sequencing DNA: the Maxam-Gilbert chemical degradation method [A. M. Maxam et al., Meth. in Enzym., Vol. 65, 499-559 (1980)] and the Sanger dideoxy chain termination method [F. Sanger, et al., Proc. Nat. Acad. Sci. USA, Vol 74, 5463-5467 (1977)]. A common feature of these two techniques is the generation of a set of DNA fragments which are analyzed by electrophoresis. The techniques differ in the methods used to prepare these fragments.
With the Maxam-Gilbert technique, DNA fragments are prepared through base-specific, chemical cleavage of the piece of DNA to be sequenced. The piece of DNA to be sequenced is first 5'-end-labeled with .sup.32 P and then divided into four portions. Each portion is subjected to a different set of chemical treatments designed to cleave DNA at positions adjacent to a given base (or bases). The result is that all labeled fragments will have the same 5'-terminus as the original piece of DNA and will have 3'-termini defined by the positions of cleavage. This treatment is done under conditions which generate DNA fragments which are of convenient lengths for separation by gel electrophoresis.
With Sanger's technique, DNA fragments are produced through partial enzymatic copying (i.e. synthesis) of the piece of DNA to be sequenced. In the most common version, the piece of DNA to be sequenced is inserted, using standard techniques, into a "sequencing vector", a large, circular, single-stranded piece of DNA such as the bacteriophage M13. This becomes the template for the copying process. A short piece of DNA with its sequence complementary to a region of the template just upstream from the insert is annealed to the template to serve as a primer for the synthesis. In the presence of the four natural deoxyribonucleoside triphosphates (dNTP's), a DNA polymerase will extend the primer from the 3'-end to produce a complementary copy of the template in the region of the insert. To produce a complete set of sequencing fragments, four reactions are run in parallel, each containing the four dNTP's along with a single dideoxyribonucleoside triphosphate (ddNTP) terminator, one for each base. (.sup.32 P-Labeled dNTP is added to afford labeled fragments.) If a dNTP is incorporated by the polymerase, chain extension can continue. If the corresponding ddNTP is selected, the chain is terminated. The ratio of ddNTP to dNTP's is adjusted to generate DNA fragments of appropriate lengths. Each of the four reaction mixtures will, thus, contain a distribution of fragments with the same dideoxynucleoside residue at the 3'-terminus and a primer-defined 5'-terminus.
The terms "terminator", "chain terminator" and "chain terminating substrate" are used interchangeably throughout to denote a substrate which can be incorporated onto the 3'-end of a DNA or RNA chain by an enzyme which replicates nucleic acids in a template-directed manner but, once incorporated, prevents further chain extension. In contrast, the natural deoxynucleotide substrates can be considered to be "chain propagating substrates".
The term "nucleoside" is used throughout to denote a heterocyclic base-sugar unit composed of one molecule of pyrimidine or purine (or derivatives thereof) and one molecule of a ribose sugar (or derivatives or functional equivalents thereof). The term "nucleotide" is used throughout to denote either a nucleoside or its phosphorylated derivative.
In both the Sanger and Maxam-Gilbert methods, base sequence information which generally cannot be directly determined by physical methods has been converted into chain-length information which can be determined. This determination can be accomplished through electrophoretic separation. Under denaturing conditions (high temperature, urea present, etc.), short DNA fragments migrate as if they were stiff rods. If a gel matrix is employed for the electrophoresis, the DNA fragments will be sorted by size. The single-base resolution required for sequencing can usually be obtained for DNA fragments containing up to several hundred bases.
To determine a full sequence, the four sets of fragments produced by either Maxam-Gilbert or Sanger methodology are subjected to electrophoresis in four parallel lanes. This results in the fragments being spatially resolved along the length of the gel. The pattern of labeled fragments is typically transferred to photosensitive film by autoradiography (i.e. an exposure is produced by sandwiching the gel and the film for a period of time). The developed film shows a continuum of bands distributed in the four lanes, often referred to as a sequencing ladder. The ladder is read by visually scanning the film (starting with the short, faster moving fragments) and determining the lane in which the next band occurs for each step on the ladder. Since each lane is associated with a given base (or combination of bases in the Maxam-Gilbert case), the linear progression of lane assignments translates directly into base sequence.
The Sanger and Maxam-Gilbert methods for DNA sequencing are conceptually elegant and efficacious but they are operationally difficult and time-consuming. Analysis of these techniques shows that many of the problems stem from the use of a single radioisotopic reporter. [A reporter can be defined as a chemical group which has a physical or chemical characteristic which can be readily measured or detected by appropriate physical or chemical detector systems or procedures. Ready detectability can be provided by such characteristics as color change, luminescence, fluorescence, or radioactivity; or it may be provided by the ability of the reporter to serve as a ligand recognition site to form specific ligand-ligand complexes which contain groups detectable by conventional (e.g., colorimetric, spectrophotometric, fluorometric or radioactive) detection procedures. The ligand-ligand complexes can be in the form of protein-ligand, enzyme-substrate, antibody-antigen, carbohydrate-lectin, protein-cofactor, protein-effector, nucleic acid-nucleic acid or nucleic acid-ligand complexes.]
The use of short-lived radioisotopes such as .sup.32 P at high specific activity is problematic from both a logistical and a health-and-safety point of view. The short half-life of .sup.32 P necessitates the anticipation of reagent requirements several days in advance and prompt use of the reagent. Once .sup.32 P-labeled DNA sequencing fragments have been generated, they are prone to self-destruction and must be immediately subjected to electrophoretic analysis. The large electrophoresis gels required to achieve single base separation lead to large volumes of contaminated buffer leading to waste disposal problems. The autoradiography required for subsequent visualization of the labeled DNA fragments in the gel is a slow process (overnight exposures are common) and adds considerable time to the overall operation. Finally, there are the possible health risks associated with use of such potent radioisotopes.
The use of only a single reporter to analyze the position of four bases lends considerable operational complexity to the overall process. The chemical/enzymatic steps must be carried out in separate vessels and electrophoretic analysis must be carried out in four parallel lanes. Thermally induced distortions in mobility result in skewed images of labeled DNA fragments (e.g. the smile effect) which, in turn, lead to difficulties in comparing the four lanes. These distortions often limit the number of bases that can be read on a single gel.
The long times required for autoradiographic imaging along with the necessity of using four parallel lanes force a "snapshot" mode of visualization. Since simultaneous spatial resolution of a large number of bands is needed, very large gels must be used. This results in additional problems: large gels are difficult to handle and are slow to run, adding more time to the overall process.
Finally, there is a problem of manual interpretation. Conversion of a sequencing ladder into a base sequence is a time-intensive, error-prone process requiring the full attention of a highly skilled scientist. Numerous attempts have been made to automate the reading and some mechanical aids do exist, but the process of interpreting a sequence gel is still painstaking and slow.
To address these problems, replacement of .sup.32 P/autoradiography with an alternative, non-radioisotopic reporter/detection system has been considered. Such a detection system would have to be exceptionally sensitive to achieve a sensitivity comparable to .sup.32 P; each band on a sequencing gel contains on the order of 10.sup.-16 mole of DNA. One method of detection which is capable of reaching this level of sensitivity is fluorescence. DNA fragments could be labeled with one or more fluorescent labels (fluorescent dyes). Excitation with an appropriate light source would result in a characteristic emission from the label thus identifying the band.
The use of fluorescent labels, as opposed to radioisotopic labels, would allow easier tailoring of the detection system to this particular application. For example, the use of four different fluorescent labels distinguishable on the basis of some emission characteristic (e.g. spectral distribution, life-time, polarization) would allow linking a given label uniquely with the sequencing fragments associated with a given base. With this linkage established, the fragments could be combined and resolved in a single lane and the base assignment could be made directly on the basis of the chosen emission characteristic.
So far two attempts to develop a fluorescence-based DNA sequencing system have been described. The first system, developed at the California Institute of Technology, has been disclosed in L. M. Smith, West German Pat. Appl. #DE 3446635 Al (1984); L. E. Hood et al., West German Pat. Appl. #DE 3501306 Al (1985); L. M. Smith et al., Nucleic Acids Research, Vol. 13, 2399-2412 (1985); and L. M. Smith et al., Nature, Vol. 321, 674-679 (1986). This system conceptually addresses the problems described in the previous section but the specifics of the implementation render Smith's approach only partially successful. For example, the large wavelength range of the emission maxima of the fluorescently-labeled DNA sequencing fragments used in this system make it difficult to excite all four dyes efficiently with a single monochromatic source. More importantly, the significant differential perturbations in electrophoretic mobility arising from dyes with different net charges make it difficult or impossible to perform single-lane sequencing with the set of dyes used in this system. These difficulties are explicitly pointed out by Smith et al.
In general, the methodology used to prepare the fluorescence-labeled sequencing fragments creates difficulties. For Maxam-Gilbert sequencing, 5'-labeled oligonucleotides are enzymatically ligated to "sticky ended", double-stranded fragments of DNA produced through restriction cleavage. This limits one to sequencing fragments produced in this fashion. For Sanger sequencing, 5'-labeled oligonucleotides are used as primers. Four special primers are required. To use a new vector system one has to go through the complex process of synthesizing and purifying four new dye-labeled primers. The same thing will be true whenever a special primer is needed.
The use of labeled primers is inferior in other respects as well. The polymerization reactions must still be carried out in separate vessels. As in the Maxam-Gilbert and Sanger sequencing systems, effectively all fragments derived from the labeled primer will be fluorescently labeled. Thus, the resulting sequencing pattern will retain most of the common artifacts (e.g. false or shadow bands, pile-ups) which arise when enzymatic chain extension is interrupted by processes other than incorporation of a chain terminator.
In a second approach, W. Ansorge et al., J. Biochem. Biophys. Methods, Vol. 13, 315-323 (1986), have disclosed a non-radioisotopic DNA sequencing technique in which a single 5'-tetramethylrhodamine fluorescent label is covalently attached to the 5'-end of a 17-base oligonucleotide primer. This primer is enzymatically extended in four vessels through the standard dideoxynucleotide sequencing chemistry to produce a series of enzymatically copied DNA fragments of varying length. Each of the four vessels contains a dideoxynucleotide chain terminator corresponding to one of the four DNA bases which allows terminal base assignment from conventional electrophoretic separation in four gel lanes. The 5'-tetramethylrhodamine fluorescent label is excited by an argon ion laser beam passing through the width of the entire gel. Although this system has the advantage that a fluorescent reporter is used in place of a radioactive reporter, all of the disadvantages associated with conventional sequencing and with preparing labeled primers still remain.
Until now, no one has created a DNA sequencing system which combines the advantages of fluorescence detection with terminator labeling. If appropriate fluorescently-labeled chain terminators could be devised, labeled sequencing fragments would be produced only when a labeled chain terminator is enzymatically incorporated into a sequencing fragment, eliminating many of the artifacts associated with other labeling methods. If each of the four chain terminators needed to sequence DNA were covalently attached to a different distinguishable fluorescent reporter, it should be possible, in principle, to incorporate all four terminators during a single primer extension reaction and then to analyze the resulting sequencing fragments in a single gel lane. If such fluorescently-labeled chain terminators could be devised, these compounds would probably also be useful for other types of enzymatic labeling of nucleic acids. In particular, analogs of fluorescently-labeled chain terminators could be designed to use other, non-fluorescent, reporters or to serve as chain-propagating substrates for enzymes which replicate nucleic acids in a template-directed manner (e.g., reverse transcriptase, RNA polymerase or DNA polymerase). Introducing a reporter into DNA in a manner useful for sequencing is one of the most difficult nucleic acid labeling problems. Compounds and/or strategies developed for DNA sequencing are also likely to be applicable to many other labeling problems.
To be useful as a chain-terminating substrate for fluorescence-based DNA sequencing, a substrate must contain a fluorescent label and it must be accepted by an enzyme useful for sequencing DNA. Suitable substrate candidates are expected to be derivatives or analogs of the naturally-occurring nucleotides. Because of the expectation that a fluorescent label and a nucleotide will not fit into the active site of a replication enzyme at the same time, a well-designed substrate must have the fluorescent label separated from the nucleotide by a connecting group of sufficient length and appropriate geometry to position the fluorescent label away from the active site of the enzyme. The nature of the connecting group can vary with both the label and the enzyme used. For ease of synthesis and adaptability to variations in label and/or enzyme requirements, however, it is most convenient to consider the connecting group as consisting of a linker which is attached to the nucleotide and to the fluorescent label.
In the design of fluorescently-labeled chain terminators for DNA sequencing, the linker must satisfy several requirements:
1) one must be able to attach the same or a functionally equivalent linker to all four bases found in DNA;
2) the linker must not prevent the labeled nucleotide from being utilized effectively as a chain terminating substrate for an enzyme useful for DNA sequencing;
3) the linker (plus optional spacer and label) must perturb the electrophoresis of oligonucleotides to which it is attached in a manner which is independent of the base to which it is attached;
4) the attachment of the linker to the base and the spacer or label must be stereoselective and regioselective to produce a single, well-defined nucleotide substrate; and
5) the linker should preferably contain a primary or secondary amine for coupling with the label.
Although five different types of amine linkers have been disclosed for attaching labels to nucleotides and oligonucleotides (see below), none of these linkers meet all five of the requirements listed above for use in a chain terminating substrate useful in DNA sequencing.
Bergstrom et al., J. Am. Chem. Soc., Vol. 98, 1587 (1976), disclose a method for attaching alkene-amino and acrylate side-chains to nucleosides by Pd(II)-catalyzed coupling of 5-mercurio-uridines to olefins. Ruth, PCT/US84/00279, discloses the use of the above side-chains as linkers for the attachment of reporters to non-enzymatically synthesized oligonucleotides. Langer et al., Proc. Nat. Acad. Sci. USA, Vol. 78, 6633 (1981), disclose the use of allylamino linkers for the attachment of reporters to nucleotides. The disadvantages of these linkers include the difficulty of preparing regioselectively the appropriate mercurial nucleotide precursors, the difficulty of separating the mixture of products generated by some of these nucleotide/olefin coupling reactions, and the potential lability of vinyl substituted nucleosides. Furthermore, the only reporters which have been incorporated with this linker are biotin and digoxigenin, Schmitz et al., Analytical Biochemistry, Vol. 192, 222-231 (1991). These reporters have the disadvantage that they must be detected via a complex with avidin, streptavidin, or anti-digoxigenin antibodies, proteins which bind tightly to them. These proteins, and thus indirectly biotin and digoxigenin, are detected by attaching fluorescent or enzymatic reporters to them. For some applications, such as fluorescent in situ hybridization, direct fluorescent tagging would provide a superior method for tagging DNA. Klevan et al., WO 86/02929, disclose a method for attaching linkers to the N4 position of cytidine and the N6 position of adenosine. The disadvantage of this method is that there is no analogous site in uridine and guanosine for attaching a linker.
Another potential linker which might satisfy the five requirements listed above is an alkynylamino linker, in which one end of the triple bond is attached to the nucleoside and the other end of the triple bond is attached to a group which contains a primary or secondary amine. To insure chemical stability, the amine should not be directly attached to the triple bond. Some methods of attaching alkyne groups to nucleosides have been disclosed (see below).
Barr et al., J. Chem. Soc., Perkins Trans. I, 1263-1267 (1978), disclose the syntheses of 5-ethynyluridine, 2'-deoxy-5-ethynyluridine, 5-ethynylcytosine, 5-ethynylcytidine, 2'-deoxy-5-ethynylcytidine and the .alpha.-anomers of the 2'-deoxyribonucleosides. The 2'-deoxyribonucleosides were prepared by constructing the heterocycles, coupling with a functionalized 2-deoxy sugar, separating the anomeric mixtures, and removing the protecting groups on the sugars.
Bergstrom et al., J. Am. Chem. Soc., Vol. 100, 8106 (1978), disclose the palladium-catalyzed coupling of alkenes with 5-mercuri or 5-iodo derivatives of uracil nucleosides. This method was reported to fail in analogous reactions of alkynes with uracil nucleoside derivatives.
Vincent et al., Tetrahedron Letters, Vol. 22, 945-947 (1981), disclose the synthesis of 5-alkynyl-2'-deoxyuridines by the reaction of 0-3',5'-bis(trimethylsilyl)deoxyuridine with alkynylzinc reagents in the presence of palladium or nickel catalysts [dichloro-bis(triphenylphosphine)palladium(II), dichloro-bis(benzonitrile)palladium(II) or dichloro(ethylene-(bis(diphenylphosphine))nickel(II)].
Robins et al., J. Org. Chem., Vol. 48, 1854-1862 (1983), disclose a method for coupling terminal alkynes, HC.tbd.CR (R=H, alkyl, phenyl, trialkylsilyl, hydroxyalkyl or protected hydroxyalkyl), to 5-iodo-1-methyluracil and 5-iodouracil nucleosides (protected as their p-toluyl esters) in the presence of bis(triphenylphosphine)palladium(II) chloride and copper(I) iodide in warm triethylamine. When 3',5'-di-O-acetyl-5-iodo-2'-deoxyuridine was reacted with hexyne, 4-(p-toluyloxy)butyne, 4-(tetrahydropyranyloxy) or 4-(trityloxy)butyne, the major products were the cyclized furano[2,3-d]pyrimidin-2-ones rather than the desired alkynyluridines.
None of the above references discloses a method for attaching an alkynylamino linker to nucleosides. The methodology of Bergstrom fails, and that of Barr is not directly applicable. The catalysts used by Robins et al. and Vincent et al. have the potential to promote numerous undesirable side reactions (e.g., cyclization or intermolecular nucleophilic addition of the amine to an alkyne) when the alkyne contains an amino group. Coupling reactions have been reported only with iodonucleosides which contain an electron-deficient uracil base. Since Pd-catalyzed coupling reactions generally work best with electron-deficient aryl iodides, problems may be anticipated in coupling alkynes to any of the other three bases (which are all more electron-rich than uracil).
There remains a need for alkynylamino nucleotides and for methods permitting their preparation.