DNA sequencing is one of the cornerstone analytical techniques of modern molecular biology. The development of reliable methods for sequencing has led to great advances in the understanding of the organization of genetic information and has made possible the manipulation of genetic material (i.e. genetic engineering).
There are currently two general methods for sequencing DNA: the Maxam-Gilbert chemical degradation method [A. M. Maxam et al., Meth. in Enzym. 65 499-559 (1980)] and the Sanger dideoxy chain termination method [F. Sanger, et al., Proc. Nat. Acad. Sci. U.S.A. 74 5463-5467 (1977)]. A common feature of these two techniques is the generation of a set of DNA fragments which are analyzed by electrophoresis. The techniques differ in the methods used to prepare these fragments.
With the Maxam-Gilbert technique, DNA: fragments are prepared through base-specific, chemical cleavage of the piece of DNA to be sequenced. The piece of DNA to be sequenced is first 5'-end-labeled with .sup.32 P and then divided into four portions. Each portion is subjected to a different set of chemical treatments designed to cleave DNA at positions adjacent to a given base (or bases). The result is that all labeled fragments will have the same 5'-terminus as the original piece of DNA and will have 3'-termini defined by the positions of cleavage. This treatment is done under conditions which generate DNA fragments which are of convenient lengths for separation by gel electrophoresis.
With Sanger's technique, DNA fragments are produced through partial enzymatic copying (i.e. synthesis) of the piece of DNA to be sequenced. In the most common version, the piece of DNA to be sequenced is inserted, using standard techniques, into a large, circular, single-stranded piece of DNA such as the bacteriophage M13. This becomes the template for the copying process. A short piece of DNA with its sequence complementary to a region of the template just upstream from the insert is annealed to the template to serve as a primer for the synthesis. In the presence of the four natural deoxyribonucleoside triphosphates (dNTP's), a DNA polymerase will extend the primer from the 3'-end to produce a complementary copy of the template in the region of the insert. To produce a complete set of sequencing fragments, four reactions are run in parallel, each containing the four dNTP's along with a single dideoxyribonucleoside triphosphate (ddNTP) terminator, one for each base (.sup.32 P-Labeled dNTP is added to afford labeled fragments.) If a dNTP is incorporated by the polymerase, chain extension can continue. If the corresponding ddNTP is selected, the chain is terminated. The ratio of ddNTP to dNTP's is adjusted to generate DNA fragments of appropriate lengths. Each of the four reaction mixtures will, thus, contain a distribution of fragments with the same dideoxynucleoside residue at the 3'-terminus and a primer-defined 5'-terminus.
In both methods, base sequence information which generally cannot be directly determined by physical methods has been converted into chain-length information which can be determined. This determination can be accomplished through electrophoretic separation. Under denaturing conditions (high temperature, urea present, etc.) short DNA fragments migrate as stiff rods. If a gel matrix is employed for the electrophoresis, the DNA fragments will be sorted by size. The single-base resolution required for sequencing can usually be obtained for DNA fragments containing up to several hundred bases.
To determine a full sequence, the four sets of fragments produced by either Maxam-Gilbert or Sanger methodology are subjected to electrophoresis in four parallel lanes. This results in the fragments being spatially resolved along the length of the gel. The pattern of labeled fragments is typically transferred to film by autoradiography (i.e. an exposure is produced by sandwiching the gel and the film for a period of time). The developed film shows a continuum of bands distributed between the four lanes often referred to as a sequencing ladder. The ladder is read by visually scanning the film (starting with the short, faster moving fragments) and determining the lane in which the next band occurs for each step on the ladder. Since each lane is associated with a given base (or combination of bases in the Maxam-Gilbert case), the linear progression of lane assignments translates directly into base sequence.
The Sanger and Maxam-Gilbert methods for DNA sequencing are conceptually elegant and efficacious but they are operationally difficult and time-consuming. Analysis of these techniques shows that many of the problems stem from the use of a single radioisotopic reporter.
The use of short-lived radioisotopes such as .sup.32 P at high specific activity is problematic from both a logistical and a health-and-safety point of view. The short half-life of .sup.32 P requires that reagent requirements must be anticipated several days in advance and that the reagent be used promptly. Once labeled DNA sequencing fragments are generated they are prone to self-destruction and must be immediately subjected to electrophoretic analysis. The large electrophoresis gels required to achieve single base separation lead to large volumes of contaminated buffer which must be disposed of properly. The autoradiography required to subsequently visualize the labeled DNA fragments in the gel is a slow process (overnight exposures are common) and adds considerable time to the overall operation. Finally, there are the possible health risks associated with use of such potent radioisotopes.
The use of only a single reporter to analyze the position of four bases lends considerable operational complexity to the overall process. The chemical/enzymatic steps must be run in separate containers and electrophoretic analysis must be carried out in four parallel lanes. Thermally induced distortions in mobility result in skewed images of labeled DNA fragments (e.g. the smile effect) which in turn, lead to difficulties in comparing the four lanes. These distortions often limit the number of bases that can be read on a single gel.
The long times required for autoradiographic imaging along with the necessity of using four parallel lanes force one into a "snapshot" mode of visualization. Since one needs simultaneous spatial resolution of a large number of bands one is forced to use large gels that are typically 40 cm or more in length. This results in additional problems: large gels are difficult to handle and are slow to run adding more time to the overall process.
Finally, there is a problem of manual interpretation. Conversion of a sequencing ladder into a base sequence is a time-intensive, error prone process requiring the full attention of a highly skilled scientist. Numerous attempts have been made to automate the reading and some mechanical aids do exist but the process of interpreting a sequence gel is still painstaking and slow.
To address these problems one can consider replacing .sup.32 P/autoradiography with some alternative, non-radiotsotopic reporter/detection system. Such a detection system would have to be exceptionally sensitive to achieve a sensitivity comparable to .sup.32 P; each band on a sequencing gel contains on the order of 10.sup.-16 mole of DNA. One method of detection which is capable of reaching this level of sensitivity is fluorescence. DNA fragments could be labeled with one or more fluorescent dyes. Excitation with an appropriate light source would result in a characteristic emission from the dye thus identifying the band.
The use of a fluorescent dye as opposed to a radioisotopic label would allow one to more easily tailor the detection system for this particular application. For example, the use of four different fluorescent dyes distinguishable on the basis of some emission characteristic (e.g. spectrum, life-time, polarization) would allow one to uniquely link a given tag with the sequencing fragments associated with a given base. With this linkage established, the fragments could be combined and resolved in a single lane and the base assignment could be made directly on the basis of the chosen emission characteristic.
The "real-time" nature of fluorescence detection would allow one either to rapidly scan a gel containing spatially resolved bands (resolution in space) or sit at a single point on the gel and detect bands as they sequentially pass through the detection zone (resolution in time). Large gels would not necessarily be required. Furthermore, a "real-time", single lane detection mode would be very amenable to fully automated base assignment and data transfer.
Several attempts to develop a fluorescence-based DNA sequencing system have been described. One system developed by a group at the California Institute of Technology, has been disclosed in L. M. Smith, West German Pat. Appl. #DE 3446635 A1 (1984); L. E. Hood et al., West German Pat. Appl. #DE 3501306 A1 (1985); and L. M. Smith et al., Nucleic Acids Research, 13 2399-2412 (1985). This system conceptually addresses the problems described in the previous section but the specifics of the implementation appear to render this approach only partially successful.
The Cal Tech system employs four sets of DNA sequencing fragments, each labeled with one of four fluorescent dyes. Two representative sets of fluorescent dyes are described. Each set is comprised of dyes from at least two different structural classes.
The emission maxima are spread over a large range (approximately 100 nm) to facilitate discrimination between the four, but unfortunately the absorption (excitation) maxima are also comparably spread. This makes it very difficult to efficiently excite all four dyes with a single monochromatic source and adequately detect the resulting emissions.
In contrast, the use of dyes with closely spaced absorption (and corresponding emission) peaks to enhance the excitation efficiency causes other difficulties. A detection system for DNA sequencing must be able to distinguish between four different dye emission spectra in order to identify the individual labeled fragments. These emissions are typically of relatively low intensity. Therefore, the detection system must have a high degree of sensitivity (better than 10.sup.-16 moles DNA per band) and selectivity, along with a means to minimize stray light and background noise, in order to meet desired performance characteristics. The system also must be able to frequently monitor the detection area in order to avoid missing any fragments that migrate through the gel past the detection window. Such a detection system should be relatively cost efficient to allow for multiple detection devices within a single instrument without detrimentally affecting mill cost.
Many detection devices are known which utilize fluorescence in a detection scheme. One such device is discussed in "Quantitative Fluorescence Analysis of Different Conformational Forms of DNA Bound to the Dye . . . and Separated by Electrophoresis" by Naimski et al., Anal. Biochem., 106,471-475, 1980. In this electrophoresis/detection system a glass tube is filled with agarose gel for separating the relatively large DNA fragments. A scanning monochromator is then used as the detection system for defining each of the large fragments. It is known that scanning monochromators can accurately measure a wide range of spectral characteristics; however, much light is lost due to the limited ability of the monochromator and its associated optics to collect and disperse emitted light. These detection techniques limit the fraction of light that can be sensed and measured. Consequently, their sensitivity for low light applications is limited. Additionally, light collected sequentially is typically inefficient.
The detection apparatus disclosed by Smith et al. (see above) uses a series of narrow band interference filters in order to select the wavelengths impinging upon a single or multiple photodetectors. This type of system has the advantage of being rather simple and inexpensive; however, it does have substantial deficiencies. The specific system described uses a filter photometer which can either use multiple interchangeable filters with one photodetector or multiple stationary filters with corresponding detectors. The first of these devices, a rotary filter with a single detector (see FIG. 3 of Smith et al.), has the disadvantage of limiting the time period during which each of the filtered regions can be measured. The detector time must be shared with the different filters in order to distinguish among different emission spectra. The Smith et al. system has additional optical difficulties which need not be dealt with here.
More serious problems still result from using dyes which have different net charges. The conventional sequencing gel displayed in the Smith et al., Nucleic Acids Research paper illustrates T-lanes produced from primers labeled with each of four dyes. It is clear that there are significant differential perturbations in the electrophoretic mobilities. A complete set of sequencing fragments bearing these four dyes will, when combined, show considerable overlap and perhaps even misordering when subjected to electrophoresis in a single lane. This effect, combined with the aforementioned large dynamic range in signal intensity, makes it difficult to perform single-lane sequencing with this dye set.
Finally, the methodology used to prepare the fluorescence-labeled sequencing fragments creates difficult sequencing conditions. For Maxam-Gilbert sequencing, 5'-labeled oligonucleotides are enzymatically ligated to "sticky ended", double-stranded fragments of DNA produced through restriction cleavage. This limits one to sequencing fragments produced in this fashion. For Sanger sequencing, 5'-labeled oligonucleotides are used as primers. Four special primers are required. To use a new vector system one has to go through the complex process of synthesizing and purifying four new dye-labeled primers.
A second approach to automation of non-radiolabel DNA sequencing was disclosed by Ansorge, W., et al., J. Biochem. Biophys. Methods, 13:315-323 (1986), in which a single fluorescent label was covalently attached to the 5' end of a 17-base oligonucleotide primer. This primer was reacted in four vessels with the standard dideoxynucleotide sequencing chemistry method that was modified to omit the radiolabeled nucleotide, to produce sets of enzymatically copied DNA fragments of varying length. Each of the four vessels contained a dideoxynucleotide chain terminator corresponding to one of the four DNA bases which allowed terminal base assignment from conventional electrophoretic separation in four gel lanes. Each fragment carried a 5'-tetramethylrhodamine fluorescent label which was excited by an argon ion laser passing through the width of the entire gel. Fluorescent emissions of DNA bands resolved over time were collected from the four lanes with separate, stationary means for each lane comprising imaging optics, field apertures, light guides, filter assemblies, and photomultipliers in series.
One advantage claimed by this approach is the need for fewer moving parts in the apparatus due to stationary detectors which allows continuous monitoring of the four gel lanes. This monitoring method, although more complex than that used with one lane, reportedly offers the advantage of determining the presence of a labeled band for base assignment relative to the absence of bands in the remaining three lanes to improve confidence in the assignment. In fact, the use of a single label requires the use of four lanes for base assignment and the system as presented is incapable of further simplification to improve the capacity or throughput of the instrument. The system is also limited in potential accuracy by the requirement for faithful lane-to-lane relative positioning for a single sequence analysis. Operational complexities such as thermal gradients and gel impurities may defeat this positional integrity to produce local gel distortions which affect band mobilities, that may in turn, compromise base sequence assignments.
The use of labeled primers by Ansorge et al. and Smith et al. is inferior in other respects as well. The polymerization reactions must still be carried out in separate vessels. All DNA fragments--be they bona fide termination fragments or extraneous fragments--will be labeled. This is similar to the existing system where effectively all fragments containing incorporated adenosine nucleotides are labeled. Thus, the resulting sequencing pattern will retain most of the artifacts (e.g. false or shadow bands, pile-ups) encountered in the current methods.
Finally, Brumbaugh, J. A. et al. in European Patent Application 85103155.9, published Oct. 9, 1985, disclosed a system and method for post-labeling strands of DNA which optionally contained pre-marked nucleosides. The pre-marking could be accomplished by covalent attachment of biotin to a desired chain terminating nucleotide before the nucleotide was used in a modification of the Sanger DNA chain termination method. However, the pre-marked nucleotide was not detectable in the disclosed system. The pre-marked strands of DNA prepared in separate vessels corresponding to the A, T, C, and G DNA bases, were electrophoretically separated and then exposed to a complementary binding material, typically avidin, which had a fluorophore such as fluorescein covalently attached to it. The fluorophore was detected and the signal presence was related to the particular vessel or gel/lane corresponding to A, T, C, or G originally prepared. This post-labeling method requires the preparation and subsequent electrophoretic separation of marked DNA strands in separate vessels and gels/lanes, respectively. There is no disclosure of any method or system capable of labeling DNA strands differentially in the same vessel simultaneously during the reactions of a chain termination method, or differentiating labels during strand detection in a single gel/lane of a suitable detection system.