The present invention relates generally to a method for obtaining nucleic acid sequence information. More specifically, the present invention provides certain modified nucleotides referred to as Simtides for use in nucleic acid sequencing reactions.
Nucleic acid sequencing is a critical analytical technique used in the field of molecular biology. The development of reliable methods for sequencing has led to great advances in the understanding of the organization of genetic information and has laid the foundation for the detailed analysis of the structure and function of genes. Several methods have been developed to determine the nucleotide sequence of nucleic acids.
Two general methods currently used to sequence DNA include the Maxam-Gilbert chemical degradation method (A. M. Maxam et al., Methods in Enzymology 65, 499-559 (1980)) and the Sanger dideoxy chain termination method (F. Sanger, et al., Proc. Natl. Acad. Sci. USA 74, 5463-5467(1977)). Both of these techniques are detailed in Molecular Cloning: A Laboratory Manual (Sambrook, Fritsch, Maniatis, eds., Cold Spring Harbor Laboratory Press, 1989), the disclosure of which is incorporated herein by reference.
With the Maxam-Gilbert technique, DNA fragments are prepared through base-specific chemical cleavage of the piece of DNA to be sequenced. The piece of DNA to be sequenced is first 5'-end-labeled with .sup.32 P and then divided into four portions. Each portion is subjected to a different set of chemical treatments designed to cleave DNA at positions adjacent to a given base (or bases). The result is that all labeled figments will have the same 5'-terminus as the original piece of DNA and will have 3'-termini defined by the positions of cleavage. This treatment is performed under conditions that generate DNA figments of convenient lengths for separation by gel electrophoresis.
With the Sanger technique, DNA fragments are produced through partial enzymatic copying (i.e., synthesis) of the piece of DNA to be sequenced. In the most common version, the piece of DNA to be sequenced is inserted, using standard techniques, into a "sequencing vector", a large circular, single-stranded piece of DNA such as the bacteriophage M13. This becomes the template for the copying process. A short piece of DNA with a sequence complementary to a region of the template just upstream from the insert is annealed to the template to serve as a primer for the synthesis. In the presence of the four natural deoxyribonucleoside triphosphates (dNTP's), a DNA polymerase will extend the primer from the 3'-end to produce a complementary copy of the template in the region of the insert. To produce a complete set of sequencing fragments, four reactions are run in parallel, each containing the four dNTP's along with a single dideoxyribonucleoside triphosphate (ddNTP) terminator, one for each base. .sup.32 P-labeled or fluorophore-labeled dNTP is added to afford labeled fragments. If a dNTP is incorporated by the polymerase, chain extension can continue. If the corresponding ddNTP is selected, the chain is terminated. The ratio of ddNTP to dNTP is adjusted to generate DNA fragments of appropriate lengths. Each of the four reaction mixtures will, thus, contain a distribution of fragments with the same dideoxynucleoside residue at the 3'-terminus and a primer-defined 5'terminus.
Fragments generated utilizing the Sanger method of sequencing may be end-labeled, via, for example, the utilization of primers having labeled nucleotides incorporated into their sequence. Alternatively, molecules may be end-labeled via the utilization of labeled dideoxynucleosides or other modified chain-terminating nucleotides or nucleotide mimics. Molecules can also be labeled internally by the utilization of one or more labeled nucleotides incorporated during the synthesis step of the process.
In both the Sanger and Maxam-Gilbert methods, base sequence information, which generally cannot be directly determined by physical methods, is converted into chain-length information, which can be determined. This determination can be accomplished through electophoretic separation. Under denaturing conditions (e.g., high temperature, presence of urea, etc.), short DNA fragments migrate as if they were stiff rods. If a gel matrix is employed for the electrophoresis, the DNA fragments are sorted by size. The single-base resolution required for sequencing can usually be obtained for DNA fragments containing up to several hundred bases. To determine a full sequence, the four sets of figments produced by either Maxam-Gilbert or Sanger methodology are subjected to electrophoresis. This results in the fragments being spatially resolved along the length of the gel.
Dyes such as, for example, infrared dyes, fluorescent dyes, colorimetric dyes, chemiluminescent dyes, and/or other detectable molecules, can be used instead of the .sup.32 P label in the foregoing sequencing reactions. Molecules other than dideoxynucleotides may also be used as chain terminators in these reactions.
One method of discriminating dyes in these types of reactions is described in U.S. patent application Ser. No. 07/057,566 (Prober et al.) filed Jun. 12, 1987, abandoned, entitled "Method, System, and Reagents for DNA Sequencing". This system is available from E. I. Du Pont de Nemours and Company (Wilmington, Del.), and is known as the Genesis.TM. 2000. The system comprises a means for detecting the presence of radiant energy from closely-related yet distinguishable reporters or labels that are covalently attached to compounds which function as chain-terminating nucleotides in a modified Sanger DNA chain-elongation method. Distinguishable fluorescent reporters are attached to each of the four dideoxynucleotide bases represented in Sanger DNA-sequencing reactions, i.e., dideoxynucleotides of adenine (A), guanine (G), cytosine (C), and thymine (I). These reporter-labeled chain-terminating reagents are substituted for unlabeled chain terminators in the traditional Sanger method and are combined in reactions with the corresponding deoxynucleotides, an appropriate primer, template, and polymerase. The resulting mixture contains DNA fragments of varying length that differ from each other by one base and terminate on the 3'-end with uniquely labeled chain terminators corresponding to one of the four DNA bases. This labeling method allows elimination of the customary radioactive label contained in one of the deoxynucleotides of the traditional Sanger method.
Detection of these reporter labels can be accomplished with two stationary photomultiplier tubes (PMT's) that receive differing wavelength bands of fluorescent emissions from laser-stimulated reporters attached to chain terminators on the DNA fragments. These fragments can be electrophoretically separated in space and/or time to move along an axis perpendicular to the sensing area of the PMT's. The fluorescent emissions first pass through a dichroic or other wavelength-selective filter or filters, placed so as to direct one characteristic wavelength to one PMT and the other characteristic wavelength to the other PMT. In this manner, different digital signals are created in each PMT that can be ratioed to produce a third signal that is unique to a given fluorescent reporter, even if a series of fluorescent reporters have closely-spaced emission wavelengths. This system is capable of detecting reporters with efficiently-spaced emissions whose-maxima differ by only 5 to 7 nm. Therefore, the sequential base assignments in a DNA strand of interest can be made on the basis of the unique ratio derived for each of the four reporter-labeled chain terminators which correspond to each of the four bases in DNA.
Although the base information in the Genesis.TM. system is contained in fluorescent labels, the information may also be contained in colorimetric labels (S. Beck, Anal. Biochem. 164(2), 514-520 (1987)), chemiluminescent labels (S. Beck, Nucleic Acids Research 17, 5115-5123 (1989)), or other labels.
The Genesis.TM. DNA sequencer is designed to take advantage of the dideoxy chain termination chemistry. In order to employ this chemistry, it is necessary to use four chemically-similar dyes to distinguish the four bases, A, C, G, and T. Unless the dyes are carefully chosen and exhaustively evaluated, their electrophoretic mobility may differ in some DNA sequences, leading to a scrambling of sequence information. The four dyes, chosen for similar electrophoretic mobility, have overlapping emission and excitation spectra. The need to distinguish these dyes without the excessive light loss of extremely narrow-band filters led to a two-channel detection scheme in which the ratio of two signals is used to determine which base has passed the detector. When peaks are well-resolved and noise-free, the ratiometric signals are easy to interpret However, to maximize the amount of sequence information that can be obtained from each run, it is necessary to accurately interpret the two-channel signal under conditions of poor peak resolution and significant noise.
Sequencers employing primer chemistry have also been described (L. M. Smith, et al., Nucleic Acids Research 13, 2399-2412 (1985), and W. Ansorge, et al., J. Biochem Biophys. Meth. 13, 315-323 (1986)). These sequencers employ four channels, one for each base. Other sequencers, such as that described by Kambara, et al., Biotechnology 6, 816-821 (1988), employ one signal in each of four electrophoresis lanes. These systems employ yet another class of data analysis methods, since the results from four separate lanes must be registered or aligned in the proper time sequence.
Sequencers employing primer chemistry, such as Hunkapiller, et al., U.S. Pat. No. 4,811,218, are not so restricted in the selection of dyes that may be used to tag the DNA fragments. These sequencers can employ four signal channels (one for each base) and, thus, do not require the complex algorithms needed to interpret ratiometric signals. On the other hand, these sequencers do not enjoy the advantages of terminator chemistry. In particular, primer chemistry requires four separate reaction tubes for each sample to be sequenced, while terminator chemistry requires only one.
The present invention is applicable to any sequencing strategies employing the Sanger or Maxam-Gilbert methods, or modifications thereof. The present invention is also applicable to any sequencing strategy where a label is associated with a nucleotide or nucleic acid. Examples of such sequencing strategies are described in U.S. Pat. No. 5,667,972, entitled "Method of Sequencing of Genomes by Hybridization of Oligonucleotide Probes", and in U.S. Pat. No. 5,652,103, entitled "Method for Sequencing Synthetic Oligonucleotides Containing Non-Phosphodiester Internucleotide Linkages".
Modified nucleotides carrying a detectable moiety (i.e., reporter), either radioisotopic or non-radioisotopic, have been useful in nucleic probes and oligonucleotide labeling. Nucleic acid probes containing a modified nucleotide that has a reporter group attached via a linker arm to a base have been reported. For example, Langer, et al., Proc. Natl. Acad Sci. USA 78(11), 6633-6637 (1981), describes the attachment of biotin to the C-5 position of dUTP by an allylamine linker arm. The attachment of biotin and other reporter groups to the 5-position of pyrimidines via a linker arm is also discussed in U.S. Pat. No. 4,711,955. Nucleotides labeled via a linker arm attached to the 5- or other positions of pyrimidines are also suggested in U.S. Pat. No. 4,948,882.
Bisulfite-catalyzed transamination of the N4-position of cytosine with bifunctional amines is described by Schulman, et al., Nucleic Acids Research 9(5), 1203-1217 (1981) and Draper, et al., Biochemistry 19, 1774-1781 (1980). By this method, fluorescent labels are attached via linker arms to cytidine or cytidine-containing polynucleotides. The attachment of biotin to the N4-position of cytidine is disclosed in U.S. Pat. No. 4,828,979, and the linking of detectable moieties to cytidine at the N4-position is also set forth in U.S. Pat. Nos. 5,013,831 and 5,241,060.
U.S. Pat. No 5,407,801 describes the preparation of an oligonucleotide triplex wherein a linker arm is conjugated to deoxycytidine via bisulfite-catalyzed transamination. The linker arms disclosed therein include an aminoalkyl or carboxyalkyl linker arm.
U.S. Pat. No. 5,405,950 describes cytidine analogs in which a linker arm is attached to the N4-position of the cytosine base. The linker arm is terminated with a protecting group, which prevents the penultimate residue of the linker arm from reacting with the N-hydroxysuccinimide ester of biotin amino caproic acid.
Historically, several essential criteria had to be satisfied in order for a modified nucleotide to be generally suitable as a substitute for a labeled form of a naturally occurring nucleotide. First, the modified compound had to contain a substituent that was unique, i.e., not normally associated with nucleotides or polynucleotides. Second, the molecules had to react specifically with chemical or biological reagents to provide a sensitive detection system. Third, the analogs had to be relatively efficient substrates for commonly studied nucleic acid enzymes, since numerous practical applications require that the analog be enzymatically metabolized (e.g., the analogs themselves had to function as substrates for nucleic acid polymerases). Accordingly, ring structures of bases were not modified at positions that sterically, or in any other way, interfered with the normal Watson-Crick hydrogen bonding potential of the bases. Otherwise, the substituents would yield compounds that were inactive as polymerase substrates. Substitution at ring positions that altered the normal "anti" nucleoside conformation had to be avoided, since such conformational changes usually render nucleotide derivatives unacceptable as polymerase substrates. Normally, such considerations limit substitution positions to the 5-position of a pyrimidine and the 7-position of a purine or a 7-deazapurine. Fourth, the detection system required the ability to interact with substituents incorporated into polynucleotides so as to be compatible with nucleic acid hybridization methodologies. Thus, it was preferable that detectable moieties be attached to the purine or pyrimidine through a chemical linkage or "linker arm" so that it could readily interact with antibodies, other detector proteins, or chemical reagents.
For reactions requiring hybridization steps, such as the sequencing strategy discussed in U.S. Pat. No. 5,667,972 described above, linkages that attach detectable molecules to nucleotides have had to withstand all experimental conditions to which normal nucleotides and polynucleotides are routinely subjected (e.g., extended hybridization times at elevated temperatures, phenol and organic solvent extraction, electrophoresis, etc.).
While the art has made significant strides, there still exists a need for modified nucleotides which have the ability to replace one or more natural nucleotides and which facilitate labeling of single or double stranded DNA or RNA for use in sequencing reactions. For example, labeled dideoxynucleotides have historically been more difficult than natural nucleotides to incorporate efficiently into growing nucleic acid chains in sequencing reactions. The method of the present invention addresses this need.
The method of the present invention permits detectable molecules to be associated with nucleic acids before, during, or subsequent to sequencing reactions to facilitate labeling and detection. All types of labels (i.e., detectable molecules) may be utilized in the practice of the present invention, including but not limited to infrared, fluorescent and colorimetric labels. In the present invention, multiple reactions can be performed in each sequencing lane and any nucleic acid polymerase may be used. A single reaction condition can be utilized for all types of labels, particularly where labels are associated with linker molecules subsequent to incorporation. Furthermore, sequencing can be performed utilizing chain termination or primer labeling techniques.