Nucleotide sequencing methods are valuable for molecular biological investigations. The development of rapid and reliable nucleotide sequencing methods has led to great advances in the understanding of the organization of genetic information and has laid the foundation for the detailed analysis of the structure and function of genes.
Two general methods used to sequence DNA include the Maxam-Gilbert chemical degradation method (A. M. Maxam et al., Methods in Enzymology 65, 499-559 (1980)) and the Sanger dideoxy chain termination method (F. Sanger, et al., Proc. Natl. Acad. Sci. USA 74, 5463-5467 (1977)). Both of these techniques are detailed in Molecular Cloning: A Laboratory Manual (Sambrook, Fritsch, Maniatis, eds., Cold Spring Harbor Laboratory Press, 1989).
With the Maxam-Gilbert technique, DNA fragments are prepared through base-specific chemical cleavage of the piece of DNA to be sequenced. The piece of DNA to be sequenced is first 5′-end-labeled with 32P and then divided into four portions. Each portion is subjected to a different set of chemical treatments designed to cleave DNA at positions adjacent to a given base (or bases). The result is that all labeled fragments will have the same 5′-terminus as the original piece of DNA and will have 3′-termini defined by the positions of cleavage. This treatment is performed under conditions that generate DNA fragments of convenient lengths for separation by gel electrophoresis.
With the Sanger technique, DNA fragments are produced through partial enzymatic copying (i.e., synthesis) of the piece of DNA to be sequenced. In this method, the piece of DNA to be sequenced may be inserted, using standard techniques, into a “sequencing vector”, a large circular, single-stranded piece of DNA such as the bacteriophage M13. This becomes the template for the copying process. Alternatively, instead of using a sequencing vector, polymerase chain reaction (PCR) can be used to produce templates directly from genomic DNA, eliminating the need for cloning. A short piece of DNA with a sequence complementary to a region of the template just upstream from the region to be sequenced is annealed to the template and serves as a primer for the synthesis. In the presence of the four natural deoxyribonucleoside triphosphates (dNTP's), a DNA polymerase will extend the primer from the 3′-end to produce a complementary copy of the template. To produce a complete set of sequencing fragments, four reactions are run in parallel, each containing the four dNTP's along with a single dideoxyribonucleoside triphosphate (ddNTP) terminator, one for each base. 32P-labeled or fluorophore-labeled dNTP or ddNTP is added to afford labeled fragments. If a dNTP is incorporated by the polymerase, chain extension can continue. If the corresponding ddNTP is selected, the chain is terminated. The ratio of ddNTP to dNTP is adjusted to generate DNA fragments of appropriate lengths. Each of the four reaction mixtures will, thus, contain a distribution of fragments with the same dideoxynucleoside residue at the 3′-terminus and a primer-defined 5′ terminus. In some modern methods, the four reactions are combined into a single reaction by labeling the four ddNTPs with different color fluorophores. When the product of this reaction is separated by capillary electrophoresis, the individual fragments are distinguished by size and color.
Fragments generated utilizing the Sanger method of sequencing may be end-labeled, via, for example, the utilization of primers having labeled nucleotides incorporated into their sequence. Alternatively, molecules may be end-labeled via the utilization of labeled dideoxynucleosides or other modified chain-terminating nucleotides or nucleotide mimics. Molecules can also be labeled internally by the utilization of one or more labeled nucleotides incorporated during the synthesis step of the process.
In both the Sanger and Maxam-Gilbert methods, base sequence information, which generally cannot be directly determined by physical methods, is converted into chain-length information, which can be determined. This determination can be accomplished through electrophoretic separation. Under denaturing conditions (e.g., high temperature, presence of urea, etc.), short DNA fragments migrate as if they were stiff rods. If a gel matrix is employed for the electrophoresis, the DNA fragments are sorted by size. The single-base resolution required for sequencing can usually be obtained for DNA fragments containing up to several hundred bases. To determine a full sequence, the four sets of fragments produced by either Maxam-Gilbert or Sanger methodology are subjected to electrophoresis. This results in the fragments being spatially resolved along the length of the gel.
Dyes such as, for example, infrared dyes, fluorescent dyes, colorimetric dyes, chemiluminescent dyes, and/or other detectable molecules, can be used instead of the 32P label in the foregoing sequencing reactions. Molecules other than dideoxynucleotides may also be used as chain terminators in these reactions.
One method of discriminating dyes in these types of reactions is described in U.S. Pat. No. 5,242,796 entitled “Method, System, and Reagents for DNA Sequencing”. This system is available from E.I. Du Pont de Nemours and Company (Wilmington, Del.), and is known as the Genesis 2000. The system detects the presence of radiant energy from closely-related yet distinguishable reporters or labels that are covalently attached to compounds which function as chain-terminating nucleotides in a modified Sanger DNA chain-elongation method. Distinguishable fluorescent reporters are attached to each of the four dideoxynucleotide bases represented in Sanger DNA-sequencing reactions, i.e., dideoxynucleotides of adenine (A), guanine (G), cytosine (C), and thymine (T). These reporter-labeled chain-terminating reagents are substituted for unlabeled chain terminators in the traditional Sanger method and are combined in reactions with the corresponding deoxynucleotides, an appropriate primer, template, and polymerase. The resulting mixture contains DNA fragments of varying length that differ from each other by one base and terminate on the 3′-end with uniquely labeled chain terminators corresponding to one of the four DNA bases. This labeling method allows elimination of the customary radioactive label contained in one of the deoxynucleotides of the traditional Sanger method.
Detection of these reporter labels can be accomplished with two stationary photomultiplier tubes (PMT's) that receive differing wavelength bands of fluorescent emissions from laser-stimulated reporters attached to chain terminators on the DNA fragments. These fragments can be electrophoretically separated in space and/or time to move along an axis perpendicular to the sensing area of the PMT's. The fluorescent emissions first pass through a dichroic or other wavelength-selective filter or filters, placed so as to direct one characteristic wavelength to one PMT and the other characteristic wavelength to the other PMT. In this manner, different digital signals are created in each PMT that can be ratioed to produce a third signal that is unique to a given fluorescent reporter, even if a series of fluorescent reporters have closely-spaced emission wavelengths. This system is capable of detecting reporters with efficiently-spaced emissions whose maxima differ by only 5 to 7 nm. Therefore, the sequential base assignments in a DNA strand of interest can be made on the basis of the unique ratio derived for each of the four reporter-labeled chain terminators which correspond to each of the four bases in DNA. Although the base information in the Genesis system is contained in fluorescent labels, the information may also be contained in colorimetric labels (S. Beck, Anal. Biochem. 164(2), 514-520 (1987)), chemiluminescent labels (S. Beck Nucleic Acids Research 17, 5115-5123 (1989)), or other labels.
The Genesis DNA sequencer is designed to take advantage of the dideoxy chain termination chemistry. In order to employ this chemistry, it is necessary to use four chemically-similar dyes to distinguish the four bases, A, C, G, and T. Unless the dyes are carefully chosen and exhaustively evaluated, their electrophoretic mobility may differ in some DNA sequences, leading to a scrambling of sequence information. The four dyes, chosen for similar electrophoretic mobility, have overlapping emission and excitation spectra. The need to distinguish these dyes without the excessive light loss of extremely narrow-band filters led to a two-channel detection scheme in which the ratio of two signals is used to determine which base has passed the detector. When peaks are well-resolved and noise-free, the ratiometric signals are easy to interpret. However, to maximize the amount of sequence information that can be obtained from each run, it is necessary to accurately interpret the two-channel signal under conditions of poor peak resolution and significant noise.
Sequencers employing primer chemistry have also been described (L. M. Smith, et al., Nucleic Acids Research 13, 2399-2412 (1985), and W. Ansorge, et al., J. Biochem. Biophys. Meth. 13, 315-323 (1986)). These sequencers employ four channels, one for each base. Other sequencers, such as that described by Kambara, et al., Biotechnology 6, 816-821 (1988), employ one signal in each of four electrophoresis lanes. These systems employ yet another class of data analysis methods, since the results from four separate lanes must be registered or aligned in the proper time sequence.
Although the advancements of nucleotide sequencing methods have led to an expansion of useful applications of nucleotide sequencing technology, certain limitations remain. For example, regions of a genome that are repetitive, assume a secondary, tertiary or quaternary structure or have polymorphic regions may be difficult to sequence. Moreover, it may be difficult or impossible to sequence a region of a genome that is near or adjacent to difficult to sequence regions using certain sequencing techniques. Also, using dye-termination sequencing protocols, the first 50 nucleotides or so do not provide useful sequence results, making it difficult or impossible to sequence short nucleotide segments. For many applications of genomic sequencing difficult to sequence regions may be ignored or omitted from a particular study. In clinical diagnostics, however, the ability to obtain the sequence of a particular nucleotide segment may have great clinical value. Thus, methods that improve the ability to obtain nucleotide sequences in or around difficult to sequence genomic regions are desired.