Since the genetic information is represented by the sequence of the four DNA building blocks deoxyadenosine-(dpA), deoxyguanosine-(dpG), deoxycytidine-(dpC) and deoxythymidine-5'-phosphate (dpT), DNA sequencing is one of the most fundamental technologies in molecular biology and the life sciences in general. The ease and the rate by which DNA sequences can be obtained greatly affects related technologies such as development and production of new therapeutic agents and new and useful varieties of plants and microorganisms via recombinant DNA technology. In particular, unraveling the DNA sequence helps in understanding human pathological conditions including genetic disorders, cancer and AIDS. In some cases, very subtle differences such as a one nucleotide deletion, addition or substitution can create serious, in some cases even fatal, consequences. Recently, DNA sequencing has become the core technology of the Human Genome Sequencing Project (e.g., J. E. Bishop and M. Waldholz, 1991, Genome: The Story of the Most Astonishing Scientific Adventure of Our Time--The Attempt to Map All the Genes in the Human Body, Simon & Schuster, New York). Knowledge of the complete human genome DNA sequence will certainly help to understand, to diagnose, to prevent and to treat human diseases. To be able to tackle successfully the determination of the approximately 3 billion base pairs of the human genome in a reasonable time frame and in an economical way, rapid, reliable, sensitive and inexpensive methods need to be developed, which also offer the possibility of automation. The present invention provides such a technology.
Recent reviews of today's methods together with future directions and trends are given by Barrell (The FASEB Journal 5, 40-45 (1991)), and Trainor (Anal. Chem. 62, 418-26 (1990)).
Currently, DNA sequencing is performed by either the chemical degradation method of Maxam and Gilbert (Methods in Enzymology 65, 499-560 (1980)) or the enzymatic dideoxynucleotide termination method of Sanger et al. (Proc. Natl. Acad. Sci. USA 74, 5463-67 (1977)). In the chemical method, base specific modifications result in a base specific cleavage of the radioactive or fluorescently labeled DNA fragment. With the four separate base specific cleavage reactions, four sets of nested fragments are produced which are separated according to length by polyacrylamide gel electrophoresis (PAGE). After autoradiography, the sequence can be read directly since each band (fragment) in the gel originates from a base specific cleavage event Thus, the fragment lengths in the four "ladders" directly translate into a specific position in the DNA sequence.
In the enzymatic chain termination method, the four base specific sets of DNA fragments are formed by starting with a primer/template system elongating the primer into the unknown DNA sequence area and thereby copying the template and synthesizing a complementary strand by DNA polymerases, such as Klenow fragment of E. coli DNA polymerase I, a DNA polymerase from Thermus aquaticus, Taq DNA polymerase, or a modified T7 DNA polymerase, Sequenase (Tabor et al., Proc. Natl. Acad. Sci. USA 84, 4767-4771 (1987)), in the presence of chain-terminating reagents. Here, the chain-terminating event is achieved by incorporating into the four separate reaction mixtures in addition to the four normal deoxynucleoside triphosphates, dATP, dGTP, dTTP and dCTP, only one of the chain-terminating dideoxynucleoside triphosphates, ddATP, ddGTP, ddTTP or ddCTP, respectively, in a limiting small concentration. The four sets of resulting fragments produce, after electrophoresis, four base specific ladders from which the DNA sequence can be determined.
A recent modification of the Sanger sequencing strategy involves the degradation of phosphorothioate-containing DNA fragments obtained by using alpha-thio dNTP instead of the normally used ddNTPs during the primer extension reaction mediated by DNA polymerase (Labeit et al., DNA 5, 173-177 (1986); Amersham, PCT-Application GB86/00349; Eckstein et al., Nucleic Acids Res. 16, 9947 (1988)). Here, the four sets of base-specific sequencing ladders are obtained by limited digestion with exonuclease III or snake venom phosphodiesterase, subsequent separation on PAGE and visualization by radioisotopic labeling of either the primer or one of the dNTPs. In a further modification, the base-specific cleavage is achieved by alkylating the sulphur atom in the modified phosphodiester bond followed by a heat treatment (Max-Planck-Gesellschaft, DE 3930312 A1). Both methods can be combined with the amplification of the DNA via the Polymerase Chain Reaction (PCR).
On the upfront end, the DNA to be sequenced has to be fragmented into sequencable pieces of currently not more than 500 to 1000 nucleotides. Starting from a genome, this is a multi-step process involving cloning and subcloning steps using different and appropriate cloning vectors such as YAC, cosmids, plasmids and M13 vectors (Sambrook et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory Press, 1989). Finally, for Sanger sequencing, the fragments of about 500 to 1000 base pairs are integrated into a specific restriction site of the replicative form I (RF I) of a derivative of the M13 bacteriophage (Vieria and Messing, Gene 19, 259 (1982)) and then the double-stranded form is transformed to the single-stranded circular form to serve as a template for the Sanger sequencing process having a binding site for a universal primer obtained by chemical DNA synthesis (Sinha, Biernat, McManus and Koster, Nucleic Acids Res. 2, 4539-57 (1984); U.S. Pat. No. 4,725,677 upstream of the restriction site into which the unknown DNA fragment has been inserted. Under specific conditions, unknown DNA sequences integrated into supercoiled double-stranded plasmid DNA can be sequenced directly by the Sanger method (Chen and Seeburg, DNA 4, 165-170 (1985)) and Lim et al., Gene Anal. Techn. 5, 32-39 (1988), and, with the Polymerase Chain Reaction (PCR) (PCR Protocols: A Guide to Methods and Applications, Innis et al., editors, Academic Press, San Diego (1990)) cloning or subcloning steps could be omitted by directly sequencing off chromosomal DNA by first amplifying the DNA segment by PCR and then applying the Sanger sequencing method (Innis et al., Proc. Natl. Acad. Sci. USA 85, 9436-9440 (1988)). In this case, however, the DNA sequence in the interested region most be known at least to the extent to bind a sequencing primer.
In order to be able to read the sequence from PAGE, detectable labels have to be used in either the primer (very often at the 5'-end) or in one of the deoxynucleoside triphosphates, dNTP. Using radioisotopes such as .sup.32 P, .sup.33 P, or .sup.35 S is still the most frequently used technique. After PAGE, the gels are exposed to X-ray films and silver grain exposure is analyzed. The use of radioisotopic labeling creates several problems. Most labels useful for autoradiographic detection of sequencing fragments have relatively short half-lives which can limit the useful time of the labels. The emission high energy beta radiation, particularly from .sup.32 P, can lead to breakdown of the products via radiolysis so that the sample should be used very quickly after labeling. In addition, high energy radiation can also cause a deterioration of band sharpness by scattering. Some of these problems can be reduced by using the less energetic isotopes such as .sup.33 P or .sup.35 S (see, e.g., Ornstein et al., Biotechniques 3, 476 (1985)). Here, however, longer exposure times have to be tolerated. Above all, the use of radioisotopes poses significant health risks to the experimentalist and, in heavy sequencing projects, decontamination and handling the radioactive waste are other severe problems and burdens.
In response to the above mentioned problems related to the use of radioactive labels, non-radioactive labeling techniques have been explored and, in recent years, integrated into partly automated DNA sequencing procedures. All these improvements utilize the Sanger sequencing strategy. The fluorescent label can be tagged to the primer (Smith et al., Nature 321, 674-679 (1986) and EPO Patent No. 87300998.9; Du Pont De Nemours EPO Application No. 0359225; Ansorge et al. J. Biochem. Biophys. Methods 13, 325-32 (1986)) or to the chain-terminating dideoxynucloside triphosphates (Prober et al. Science 238, 336-41 (1987); Applied Biosystems, PCT Application WO 91/05060). Based on either labeling the primer or the ddNTP, systems have been developed by Applied Biosystems (Smith et al., Science 235, G89 (1987); U.S. Pat. Nos. 570,973 and 689,013), Du Pont De Nemours (Prober et al. Science 23, 336-341 (1987); U.S. Pat. Nos. 881,372 and 57,566), Pharmacia-LKB (Ansorge et al. Nucleic Acids Res. 15, 4593-4602 (1987) and EMBL Patent Application DE P3724442 and P3805808.1) and Hitachi (JP 1-90844 and DE 4011991 A1). A somewhat similar approach was developed by Brumbaugh et al. (Proc. Natl. Sci. USA 85, 5610-14 (1988) and U.S. Pat. No. 4,729,947). An improved method for the Du Pont system using two electrophoretic lanes with two different specific labels per lane is described (PCT Application W092/02635). A different approach uses fluorescently labeled avidin and biotin labeled primers. Here, the sequencing ladders ending with biotin are reacted during electrophoresis with the labeled avidin which results in the detection of the individual sequencing bands (Brumbaugh et al, U.S. Pat. No. 594,676).
More recently even more sensitive non-radioactive labeling techniques for DNA using chemiluminescence triggerable and amplifyable by enzymes have been developed (Beck, O'Keefe, Coull and Koster, Nucleic Acids Res. 17, 5115-5123 (1989) and Beck and Koster, Anal. Chem. 62, 2258-2270 (1990)). These labeling methods were combined with multiplex DNA sequencing (Church et al. Science 240, 185-188 (1988) to provide for a strategy aimed at high throughput DNA sequencing (Koster et al., Nucleic Acids Res. Symposium Ser. No. 24, 318-321 (1991), University of Utah, PCT Application No. WO 90/15883); this strategy still suffers from the disadvantage of being very laborious and difficult to automate.
In an attempt to simplify DNA sequencing, solid supports have been introduced. In most cases published so far, the template strand for sequencing (with or without PCR amplification) is immobilized on a solid support most frequently utilizing the strong biotin-avidin/streptavidin interaction (Orion-Yhtyma Oy, U.S. Pat. No. 277,643; M. Ulhlen et al. Nucleic Acids Res. 16, 3025-38 (1988); Cemu Bioteknik, PCT Application No. WO 89/09282 and Medical Research Council, GB, PCT Application No. WO 92/03575). The primer extension products synthesized on the immobilized template strand are purified of enzymes, other sequencing reagents and by-products by a washing step and then released under denaturing conditions by loosing the hydrogen bonds between the Watson-Crick base pairs and subjected to PAGE separation. In a different approach, the primer extension products (not the template) from a DNA sequencing reaction are bound to a solid support via biotin/avidin (Du Pont De Nemours, PCT Application WO 91/11533). In contrast to the above mentioned methods, here, the interaction between biotin and avidin is overcome by employing denaturing conditions (formamide/EDTA) to release the primer extension products of the sequencing reaction from the solid support for PAGE separation. As solid supports, beads, (e.g., magnetic beads (Dynabeads) and Sepharose beads), filters, capillaries, plastic dipsticks (e.g., polystyrene strips) and microtiter wells are being proposed.
All methods discussed so far have one central step in common: polyacrylamide gel electrophoresis (PAGE). In many instances, this represents a major drawback and limitation for each of these methods. Preparing a homogeneous gel by polymerization, loading of the samples, the electrophoresis itself, detection of the sequence pattern (e.g., by autoradiography), removing the gel and cleaning the glass plates to prepare another gel are very laborious and time-consuming procedures. Moreover, the whole process is error-prone, difficult to automate, and, in order to improve reproducibility and reliability, highly trained and skilled personnel are required. In the case of radioactive labeling, autoradiography itself can consume from hours to days. In the case of fluorescent labeling, at least the detection of the sequencing bands is being performed automatically when using the laser-scanning devices integrated into commercial available DNA sequencers. One problem related to the fluorescent labeling is the influence of the four different base-specific fluorescent tags on the mobility of the fragments during electrophoresis and a possible overlap in the spectral bandwidth of the four specific dyes reducing the discriminating power between neighboring bands, hence, increasing the probability of sequence ambiguities. Artifacts are also produced by base-specific interactions with the polyacrylamide gel matrix (Frank and Koster, Nucleic Acids Res. 6, 2069 (1979)) and by the formation of secondary structures which result in "band compressions" and hence do not allow one to read the sequence. This problem has, in part, been overcome by using 7-deazadeoxyguanosine triphosphates (Barr et al., Biotechniques 4, 428 (1986)). However, the reasons for some artifacts and conspicuous bands are still under investigation and need further improvement of the gel electrophoretic procedure.
A recent innovation in electrophoresis is capillary zone electrophoresis (CZE) (Jorgenson et al., J. Chromatography 352, 337 (1986); Gesteland et al., Nucleic Acids Res. 18, 1415-1419 (1990)) which, compared to slab gel electrophoresis (PAGE), significantly increases the resolution of the separation, reduces the time for an electrophoretic run and allows the analysis of very small samples. Here, however, other problems arise due to the miniaturization of the whole system such as wall effects and the necessity of highly sensitive on-line detection methods. Compared to PAGE, another drawback is created by the fact that CZE is only a "one-lane" process, whereas in PAGE samples in multiple lanes can be electrophoresed simultaneously.
Due to the severe limitations and problems related to having PAGE as an integral and central part in the standard DNA sequencing protocol, several methods have been proposed to do DNA sequencing without an electrophoretic step. One approach calls for hybridization or fragmentation sequencing (Bains, Biotechnology 10, 757-58 (1992) and Mirzabekov et al., FEBS Letters 256, 118-122 (1989)) utilizing the specific hybridization of known short oligonucleotides (e.g., octadeoxynucleotides which gives 65,536 different sequences) to a complementary DNA sequence. Positive hybridization reveals a short stretch of the unknown sequence. Repeating this process by performing hybridizations with all possible octadeoxynucleotides should theoretically determine the sequence. In a completely different approach, rapid sequencing of DNA is done by unilaterally degrading one single, immobilized DNA fragment by an exonuclease in a moving flow stream and detecting the cleaved nucleotides by their specific fluorescent tag via laser excitation (Jett et al., J. Biomolecular Structure & Dynamics 7, 301-309, (1989); United States Department of Energy, PCT Application No. WO 89/03432). In another system proposed by Hyman (Anal. Biochem. 174, 423-436 (1988)), the pyrophosphate generated when the correct nucleotide is attached to the growing chain on a primer-template system is used to determine the DNA sequence. The enzymes used and the DNA are held in place by solid phases (DEAE-Sepharose and Sepharose) either by ionic interactions or by covalent attachment In a continuous flow-through system, the amount of pyrophosphate is determined via bioluminescence (luciferase). A synthesis approach to DNA sequencing is also used by Tsien et al. (PCT Application No. WO 91/06678). Here, the incoming dNTP's are protected at the 3'-end by various blocking groups such as acetyl or phosphate groups and are removed before the next elongation step, which makes this process very slow compared to standard sequencing methods. The template DNA is immobilized on a polymer support. To detect incorporation, a fluorescent or radioactive label is additionally incorporated into the modified dTP's. The same patent application also describes an apparatus designed to automate the process.
Mass spectrometry, in general, provides a means of "weighing" individual molecules by ionizing the molecules in vacuo and making them "fly" by volatilization. Under the influence of combinations of electric and magnetic fields, the ions follow trajectories depending on their individual mass (m) and charge (z). In the range of molecules with low molecular weight, mass spectrometry has long been part of the routine physical-organic repertoire for analysis and characterization of organic molecules by the determination of the mass of the parent molecular ion. In addition, by arranging collisions of this parent molecular ion with other particles (e.g., argon atoms), the molecular ion is fragmented forming secondary ions by the so-called collision induced dissociation (CID). The fragmentation pattern/pathway very often allows the derivation of detailed structural information. Many applications of mass spectrometric methods in the known in the art, particularly in biosciences, and can be found summarized in Methods in Enzymology, Vol. 193: "Mass Spectrometry" (J. A. McCloskey, editor), 1990, Academic Press, New York.
Due to the apparent analytical advantages of mass spectrometry in providing high detection sensitivity, accuracy of mass measurements, detailed structural information by CID in conjunction with an MS/MS configuration and speed, as well as on-line data transfer to a computer, there has been considerable interest in the use of mass spectrometry for the structural analysis of nucleic acids. Recent reviews summarizing this field include K. H. Schram, "Mass Spectrometry of Nucleic Acid Components, Biomedical Applications of Mass Spectrometry" 34, 203-287 (1990); and P. F. Crain, "Mass Spectrometric Techniques in Nucleic Acid Research," Mass Spectrometry Reviews 9, 505-554 (1990). The biggest hurdle to applying mass spectrometry to nucleic acids is the difficulty of volatilizing these very polar biopolymers. Therefore, "sequencing" has been limited to low molecular weight synthetic oligonucleotides by determining the mass of the parent molecular ion and through this, confirming the already known sequence, or alternatively, confirming the known sequence through the generation of secondary ions (fragment ions) via CID in an MS/MS configuration utilizing, in particular, for the ionization and volatilization, the method of fast atomic bombardment (FAB mass spectrometry) or plasma desorption (PD mass spectrometry). As an example, the application of FAB to the analysis of protected dimeric blocks for chemical synthesis of oligodeoxynucleotides has been described (Koster et al. Biomedical Environmental Mass Spectrometry 14, 111-116 (1987)).
Two more recent ionization/desorption techniques are electrospray/ionspray (ES) and matrix-assisted laser desorption/ionization (MALDI). ES mass spectrometry has been introduced by Fenn et al. (J. Phys. Chem. 88, 4451-59 (1984); PCT Application No. WO 90/14148) and current applications are summarized in recent review articles (R. D. Smith et al., Anal. Chem. 62, 882-89 (1990) and B. Ardrey, Electrospray Mass Spectrometry, Spectroscopy Europe, 4, 10-18 (1992)). The molecular weights of the tetradecanucleotide d(CATGCCATGGCATG) (SEQ ID NO:1) (Covey et al. "The Determination of Protein, Oligonucleotide and Peptide Molecular Weights by Ionspray Mass Spectrometry," Rapid Communications in Mass Spectrometry, 2, 249-256 (1988)), of the 21-mer d(AAATTGTGCACATCCTGCAGC) (SEQ ID NO:2) and without giving details of that of a tRNA with 76 nucleotides (Methods in Enzymology, 193, "Mass Spectrometry" (McCloskey, editor), p. 425, 1990, Academic Press, New York) have been published. As a mass analyzer, a quadrupole is most frequently used. The determination of molecular weights in femtomole amounts of sample is very accurate due to the presence of multiple ion peaks which all could be used for the mass calculation.
MALDI mass spectrometry, in contrast, can be particularly attractive when a time-of-flight (TOF) configuration is used as a mass analyzer. The MALDI-TOF mass spectrometry has been introduced by Hillenkamp et al. ("Matrix Assisted UV-Laser Desorption/Ionization: A New Approach to Mass Spectrometry of Large Biomolecules," Biological Mass Spectrometry (Burlingame and McCloskey, editors), Elsevier Science Publishers, Amsterdam, pp. 49-60, 1990.) Since, in most cases, no multiple molecular ion peaks are produced with this technique, the mass spectra, in principle, look simpler compared to ES mass spectrometry. Although DNA molecules up to a molecular weight of 410,000 daltons could be desorbed and volatilized (Williams et a., "Volatilization of High Molecular Weight DNA by Pulsed Laser Ablation of Frozen Aqueous Solutions," Science, 246, 1585-87 (1989)), this technique has so far only been used to determine the molecular weights of relatively small oligonucleotides of known sequence, e.g., oligothymidylic acids up to 18 nucleotides (Huth-Fehre-et al., "Matrix-Assisted Laser Desorption Mass Spectrometry of Oligodeoxythymidylic Acids," Rapid Communications in Mass Spectrometry, 6, 209-13 (1992)) and a double-stranded DNA of 28 base pairs (Williams et al., "Time-of-Flight Mass Spectrometry of Nucleic Acids by Laser Ablation and Ionization from a Frozen Aqueous Matrix," Rapid Communications in Mass Spectrometry, 4, 348-351 (1990)). In one publication (Huth-Fehre et al., 1992, supra), it was shown that a mixture of all the oligothymidylic acids from n=12 to n=18 nucleotides could be resolved.
In U.S. Pat. No. 5,064,754, RNA transcripts extended by DNA both of which are complementary to the DNA to be sequenced are prepared by incorporating NTP's, dNTP's and, as terminating nucleotides, ddNTP's which are substituted at the 5'-position of the sugar moiety with one or a combination of the isotopes .sup.12 C, .sup.13 C, .sup.14 C, .sup.1 H, .sup.2 H, .sup.3 H, .sup.16 O, .sup.17 O and .sup.18 O. The polynucleotides obtained are degraded to 3'-nucleotides, cleaved at the N-glycosidic linkage and the isotopically labeled 5'-functionality removed by periodate oxidation and the resulting formaldehyde species determined by mass spectrometry. A specific combination of isotopes serves to discriminate base-specifically between internal nucleotides originating from the incorporation of NTP's and dNTP's and terminal nucleotides caused by linking ddNTP's to the end of the polynucleotide chain. A series of RNA/DNA fragments is produced, and in one embodiment, separated by electrophoresis, and, with the aid of the so-called matrix method of analysis, the sequence is deduced.
In Japanese Patent No. 59-131909, an instrument is described which detects nucleic acid fragments separated either by electrophoresis, liquid chromatography or high speed gel filtration. Mass spectrometric detection is achieved by incorporating into the nucleic acids atoms which normally do not occur in DNA such as S, Br, I or Ag, Au, Pt, Os, Hg. The method, however, is not applied to sequencing of DNA using the Sanger method. In particular, it does not propose a base-specific correlation of such elements to an individual ddNTP.
PCT Application No. WO 89/12694 (Brennan et al., Proc. SPIE-Int, Soc. Opt. Eng. 1206, (New Technol. Cytom. Mol. Biol.), pp. 60-77 (1990); and Brennan, U.S. Pat. No. 5,003,059) employs the Sanger methodology for DNA sequencing by using a combination of either the four stable isotopes .sup.32 S, .sup.33 S, .sup.34 S, .sup.36 S or .sup.35 Cl, .sup.37 Cl, .sup.79 Br, .sup.81 Br to specifically label the chain-terminating ddNTP's. The sulfur isotopes can be located either in the base or at the alpha-position of the triphosphate moiety whereas the halogen isotopes are located either at the base or at the 3'-position of the sugar ring. The sequencing reaction mixtures are separated by an electrophoretic technique such as CZE, transferred to a combustion unit in which the sulfur isotopes of the incorporated ddNTP's are transformed at about 900.degree. C. in an oxygen atmosphere. The SO.sub.2 generated with masses of 64, 65, 66 or 68 is determined on-line by mass spectrometry using, e.g., as mass analyzer, a quadrupole with a single ion-multiplier to detect the ion current.
A similar approach is proposed in U.S. Pat. No. 5,002,868 (Jacobson et al., Proc. SPIE-Int. Soc. Opt. Eng. 1435, (Opt. Methods Ultrasensitive Detect. Anal. Tech. Appl.), 26-35 (1991)) using Sanger sequencing with four ddNTP's specifically substituted at the alpha-position of the triphosphate moiety with one of the four stable sulfur isotopes as described above and subsequent separation of the four sets of nested sequences by tube gel electrophoresis. The only difference is the use of resonance ionization spectroscopy (RIS) in conjunction with a magnetic sector mass analyzer as disclosed in U.S. Pat. No. 4,442,354 to detect the sulfur isotopes corresponding to the specific nucleotide terminators, and by this, allowing the assignment of the DNA sequence.
EPO Patent Applications No. 0360676 A1 and 0360677 A1 also describe Sanger sequencing using stable isotope substitutions in the ddNTP's such as D, .sup.13 C, .sup.15 N, .sup.17 O, .sup.18 O, .sup.32 S, .sup.33 S, .sup.34 S, .sup.36 S, .sup.19 F, .sup.35 Cl, .sup.37 Cl, .sup.79 Br, .sup.81 Br and .sup.127 I or functional groups such as CF.sub.3 or Si(CH.sub.3).sub.3 at the base, the sugar or the alpha position of the triphosphate moiety according to chemical functionality. The Sanger sequencing reaction mixtures are separated by tube gel electrophoresis. The effluent is converted into an aerosol by the electrospray/thermospray nebulizer method and then atomized and ionized by a hot plasma (7000 to 8000.degree. K) and analyzed by a simple mass analyzer. An instrument is proposed which enables one to automate the analysis of the Sanger sequencing reaction mixture consisting of tube electrophoresis, a nebulizer and a mass analyzer.
The application of mass spectrometry to perform DNA sequencing by the hybridization/fragment method (see above) has been recently suggested (Bains, "DNA Sequencing by Mass Spectrometry: Outline of a Potential Future Application," Chimicaoggi 9, 13-16 (1991)).