An important property of biological samples which often must be determined is their molecular weight. The most common method used to perform this measurement is to electrophorese the biomolecule through an acrylamide or agarose gel, visualize the position in the gel by staining or autoradiography, and determine the sizes by comparison to molecular weight standards of known sizes.
A related technology which uses similar sizing and detection techniques is DNA or RNA sequencing. DNA is a long thread-like macromolecule comprised of a chain of four deoxyribonucleotides which contain one of the four nitrogenous bases adenine (A), cytidine (C), guanine (G), or thymine (T). Similarly, RNA is composed of a long chain of ribonucleotides. The order of these nucleotides is the genetic code of the organism from which the DNA was isolated. The determination of this order is, therefore, a most important goal for scientists working in biological fields. Manual methods to sequence DNA involve either synthesis of new DNA in the presence of dideoxyribonucleotide terminators using as a template the DNA whose sequence needs to be determined or the degradation of the DNA to be sequenced using base-specific chemical treatments. In each case, a nested set of radioactively labelled DNA fragments are generated which represent the sequence of the DNA. For example, in the dideoxy method (Sanger, F. et al., (1977) PROC. NATL. ACAD. SCI. U.S.A., 74, 54635467), the template DNA, whose sequence is to be determined, is incubated with an oligonucleotide primer, four deoxyribonucleoside 5'-triphosphates (dATP, dCTP, dGTP and dTTP), and a DNA polymerase. The primer anneals to a specific complementary position in the template DNA that is defined by the order of the bases in the primer. The DNA polymerase then begins to catalyze DNA synthesis in the 5' to 3' direction by incorporating the deoxyribonucleoside 5'-triphosphate that is complementary to the next base in the template DNA. A complementary nucleotide is defined as one that follows the base-paring rules which require an A of one strand of a double-stranded DNA molecule always pairs with a T of the other strand and that a C of one strand always pairs with a G of the other strand.
In addition to the ability of DNA polymerases to incorporate normal nucleotides into the newly synthesized strand, many polymerases can also incorporate dideoxyribonucleoside 5'-triphosphates. Dideoxyribonucleotides are identical to deoxyribonucleotides except that they lack the 3'-hydroxy group on the ribose sugar. When these nucleotide analogs are incorporated into a growing DNA chain, synthesis terminates because the chain no longer bears the 3'-hydroxyl needed to add subsequent nucleotides. In the dideoxy sequencing method, four separate sequencing reactions are run, each containing one of the four dideoxyribonucleotides (each reaction also contains the four normal deoxyribonucleotides, one labeled with .sup.32 P or 35S). Incorporation of the dideoxy analogs occurs occasionally and randomly in place of a normal nucleotide at complementary positions in the template so that each reaction generates a heterogeneous population of product DNA molecules each beginning with the primer (and thus sharing a common 5'-terminus) and each terminating with the dideoxynucleotide that was included in that reaction.
The radioactively labeled products from each of the four dideoxy sequencing reactions are denatured to separate the newly synthesized DNA from the template and then electrophoresed in adjacent lanes on a polyacrylamide gel such that the DNA product molecules are separated based on their chain length. The presence of a band in the gel represents the presence of the corresponding complementary nucleotide in the template at a specific distance from the primer. Comparison and analysis of the bands present in each of the four lanes allows the sequence of the template DNA to be deduced.
In the alternative chemical DNA sequencing method, chemicals that effect random partial cleavage of the DNA at G, G+A, C+T, and C are added in four individual reactions to a single-stranded DNA fragment containing a .sup.32 P label at the 5' end. The resulting fragments are processed as in the dideoxy method to determine the DNA sequence. Maxam, A. and Gilbert, W. (1977) PROC. NATL. ACAD. SCI. U.S.A., 74, 560-564.
Automated DNA sequencing instruments based on the dideoxy method are described in U.S. Pat. Nos. and 4,811,218 and Prober et al., Science 16 October 1987, 238; pp. 336-341. Both of these systems require the incorporation of four fluorescent dyes into the dideoxy-terminated product DNA which are then run on a polyacrylamide gel. The discrete-length product molecules are detected near the bottom of the gel by their emitted florescence following excitation with a laser. In these automated systems, many more sequences can be analyzed per gel and the sequences determined accurately out to 500 bases or greater. Furthermore, data can be recorded faster since there is no manual gel reading step required. Finally, the automated sequencers use non-isotopic detection methods so there is not added costs associated with radioactive wasted disposal.
Although these instruments offer some advantages over manual methods, they still suffer from numerous drawbacks which are inherent in the use of a polyacrylamide gel to resolve the DNA fragments. For example, this method remains labor-intensive since a gel must be poured and disposed of for each sequencing run. Also, the accuracy of the sequencing can be impacted by artifacts generated by non-uniform gel matrix or even by a particular sequence as it electrophoreses down the gel. Furthermore, although more sequences can be determined on one gel that can be done manually, 10 to 12 hours are still required to obtain this data.
These problems associated with sequencing are minor when one is considering the generation of the sequence of a small genome, but they become monumental when contemplating sequencing the human genome, estimated to contain over 3 billion base pairs.
Mass spectral methods are well known. Pulsed mass spectroscopic methods, Burlingame, A. L. et al., (1990) ANAL. CHEM., 62, 268R-303R (and references therein), such as time-of-flight (TOF) and Fourier transform ion-cyclotron-resonance mass spectroscopy (FTICR-MS), have the inherent ability to simultaneously analyze all of the components of a complex mixture in a single 200 millisecond experiment. The most significant feature of a mass spectroscopic-based method is that it does not require prior electrophoretic or chromatographic separation prior to analysis thus reducing the analysis time by at least three orders of magnitude.
A major obstacle, until now, for implementing mass spectroscopy for analysis of large biomolecules has been the lack of an appropriate interface between the water-based biological system and the high vacuum required for mass analysis. Prior studies have used techniques such as secondary ion mass spectroscopy, Aberth, W. et al., (1982) ANAL. CHEM., 54, 2029-2034, fast atom bombardment, Griffen, D. et al., (1989) BIOMED. & ENV. MASS SPECTROM., 17, 105, .sup.252 Cf plasma desorption, Sundqvist, B. et al., (1985) MASS SPECTROM, REV., 4, 421-460, electrospray, Fenn, J. B. et al., (1990) MASS SEPCTROM REVIEWS, 9, 37-70, and thermospray, Straub, K. et al., (1990) RAPID COMMUN. MASS SPECTROM, 4, 267-271, Pramanik, B. C. et al., (1989) ANAL. BIOCHEM., 176, 269-277, in an attempt to transport biomolecules from the solid phase to the gas phase. These methods suffer either from severe sample decomposition or multiple charging problems. Other obstacles for mass spectral DNA sequencing methods include: guaranteeing inadequate mass resolution at 30,000-200,000 AMU (100-500 base strands); accomplishing selective and efficient ionization of DNA strands; and avoiding multiple ionization and/or fragmentation of DNA strands.
Laser vaporization may be used for the desorption of biological molecules into the gas phase, Karas, M. et al., (1989) BIOMED. & ENV. MASS SPECTROM., 18, 841-843. Proteins with molecular weight approaching 175,000 daltons have been molecularly desorbed with this technique and detected using TOF methods, Karas, M. et al., (1989) BIOMED. & ENV. MASS SPECTROM., 18, 841-843. Recently, Cotter et al., (1990) RAPID COMMUN. MASS SPECTROM., 4, 99-102 have demonstrated matrix-assisted laser vaporization and high resolution TOF detection of oligodeoxyribonucleotides with mass up to 1797 Dalton (6 bases). In this case, the positive molecular ion peak was intense with no apparent strand cleavage. Autoradiographic studies by Williams et al. suggest that extremely long DNA strands, containing up to 1,200 nucleotides, (1989) SCIENCE, 246, 1585-1587, can be transported into the gas phase intact.
The resonance-enhanced multiphoton ionization is a tool for study of material based on exciting an atom or molecule with a laser through specific rovibronic states until the ionization energy is surpassed as shown graphically in FIG. 1.
Resonance-enhanced multiphoton ionization (REMPI), has been used to ionize many different biomolecules, including nucleotides and nucleosides, Li, L. et al., (1989) INT. JOURNAL OF MASS SPEC. & ION PROCESSES, 88, 197-210, peptides, amino acids, Grotemyer, J. et al., (1987) INT. J. MASS SPECTROM. ION PROCESSES, 78, 69-83, hormones, catecholamines, Pang, H. M. et al. (1988) APPL. SPECTROSCOPY, 42, 1200-1206, and purines, Li, L. et al., (1989) INT. JOURNAL OF MASS SPEC. & ION PROCESSES, 88, 197-210.
TOF mass spectrometry has detected proteins with masses approaching 175,000 AMU, Karas, M., Ingendoh, A., Bahr, U., Hillenkamp, F. (1989) BIOMED. & ENV. MASS SPECTROM., 18, 841-843. This would correspond to a DNA strand of approximately 530 bases long. Finally, the extremely high sensitivity a TOF mass spectrometer allows the detection of ultra-low sample amounts in the sub-attomole range.
The difficulties of the prior art are overcome by the methods described herein to analyze an organic sample and/or to determine the base sequence of a nucleic acid.
It is an object of the present invention to use current sequencing technology with a mass spectral method to directly analyze the products of enzymatic DNA sequencing reactions.
It is the object of the present invention to solve inherent problems of the prior art described above using a combination of following techniques: (i) laser vaporization methods to desorb the liquid phase DNA strands into the gas phase; (ii) pulsed molecular beam nozzle techniques to transport the gas phase strands from a flowing helium atmosphere into the vacuum system; (iii) laser ionization methods to resonantly ionize a "tag" molecule on each DNA strand; and (iv) time-of-flight methods for high mass analysis.
It is an object of this invention to desorb biomolecules by mixing the sample of interest in an excess of a "matrix," or chromophore, which is specifically chosen to absorb light where the biomolecule does not. The chromophore absorbs the extremely high powered light (10.sup.6- 10.sup.9 watts) that is presented in the short laser pulse (5ns).
It is an object of this invention to use this energy which is deposited in a short time so that all of the matrix and biomolecules are transported into the gas phase before thermal equilibrium can be attained.
It is an object of the present invention to place a single positive charge on the vaporized molecules using a technique called resonance-enhanced multiphoton ionization (REMPI). REMPI has been shown to be a very powerful tool for the analytical study of biological materials. The technique is based on selectively exciting an atom or molecule with a laser through specific vibronic states until the ionization energy is surpassed (FIG. 1).
It is an object of the present invention to place on each vaporized DNA molecule a single charge by selective ionization of a covalently attached chromophore or "tag." These charged ions are then detected using time-of-flight (TOF) mass spectrometry.
It is an object of the present invention to use the combination of a solution-phase laser vaporization method with the ability to measure high masses using a TOF mass spectrometer to provide a rapid (&lt;5 sec) method to completely analyze all of the nested strands produced from a given enzymatic dideoxy sequencing reaction.
It is a further object of the present invention to determine the sequences of the bases of a nucleic acid sample. Prior techniques are extremely slow and are highly labor intensive.
It is also a further object to describe an improved apparatus for passing tagged biological samples from a vaporizing source to an apparatus which would permit the detection of the sequences of the components of a biological sample such as a nucleic acid such as DNA or RNA.