Deoxyribonucleic acid (DNA) sequencing is one of the basic techniques of biology. It is at the heart of molecular biology and plays a rapidly expanding role in the rest of biology. The Human Genome Project is a multi-national effort to read the entire human genetic code. It is the largest project ever undertaken in biology, and has already begun to have a major impact on medicine. The development of cheaper and faster sequencing technology will ensure the success of this project. Indeed, a substantial effort has been funded by the NIH and DOE branches of the Human Genome Project to improve sequencing technology, however, without a substantial impact on current practices (Sulston and Waterston, Nature 376:175, 1995).
In the past two decades, determination and analysis of nucleic acid sequence has formed one of the building blocks of biological research. This, along with new investigational tools and methodologies, has allowed scientists to study genes and gene products in order to better understand the function of these genes, as well as to develop new therapeutics and diagnostics.
Two different DNA sequencing methodologies that were developed in 1977, are still in wide use today. Briefly, the enzymatic method described by Sanger (Proc. Natl. Acad. Sci. (USA) 74:5463, 1977) which utilizes dideoxy-terminators, involves the synthesis of a DNA strand from a single-stranded template by a DNA polymerase. The Sanger method of sequencing depends on the fact that that dideoxynucleotides (ddNTPs) are incorporated into the growing strand in the same way as normal deoxynucleotides (albeit at a lower efficiency). However, ddNTPs differ from normal deoxynucleotides (dNTPs) in that they lack the 3′-OH group necessary for chain elongation. When a ddNTP is incorporated into the DNA chain, the absence of the 3′-hydroxy group prevents the formation of a new phosphodiester bond and the DNA fragment is terminated with the ddNTP complementary to the base in the template DNA. The Maxam and Gilbert method (Maxam and Gilbert, Proc. Natl. Acad. Sci. (USA) 74:560, 1977) employs a chemical degradation method of the original DNA (in both cases the DNA must be clonal). Both methods produce populations of fragments that begin from a particular point and terminate in every base that is found in the DNA fragment that is to be sequenced. The termination of each fragment is dependent on the location of a particular base within the original DNA fragment. The DNA fragments are separated by polyacrylamide gel electrophoresis and the order of the DNA bases (adenine, cytosine, thymine, guanine; also known as A,C,T,G, respectively) is read from a autoradiograph of the gel.
A cumbersome DNA pooling sequencing strategy (Church and Kieffer-Higgins, Science 24:185, 1988) is one of the more recent approaches to DNA sequencing. A pooling sequencing strategy consists of pooling a number of DNA templates (samples) and processing the samples as pools. In order to separate the sequence information at the end of the processing, the DNA molecules of interest are ligated to a set of oligonucleotide “tags” at the beginning. The tagged DNA molecules are pooled, amplified and chemically fragmented in 96-well plates. After electrophoresis of the pooled samples, the DNA is transferred to a solid support and then hybridized with a sequential series of specific labeled oligonucleotides. These membranes are then probed as many times as there are tags in the original pool, producing, in each set of probing, autoradiographs similar to those from standard DNA sequencing methods. Thus each reaction and gel yields a quantity of data equivalent to that obtained from conventional reactions and gels multiplied by the number of probes used. If alkaline phosphatase is used as the reporter enzyme, 1,2-dioxetane substrate can be used which is detected in a chemiluminescent assay format. However, this pooling strategy's major disadvantage is that the sequences can only be read by Southern blotting the sequencing gel and hybridizing this membrane once for each clone in the pool.
In addition to advances in sequencing methodologies, advances in speed have occurred due to the advent of automated DNA sequencing. Briefly, these methods use fluorescent-labeled primers which replace methods which employed radiolabeled components. Fluorescent dyes are attached either to the sequencing primers or the ddNTP-terminators. Robotic components now utilize polymerase chain reaction (PCR) technology which has lead to the development of linear amplification strategies. Current commercial sequencing allows all 4 dideoxy-terminator reactions to be run on a single lane. Each dideoxy-terminator reaction is represented by a unique fluorescent primer (one fluorophore for each base type: A,T,C,G). Only one template DNA (i.e., DNA sample) is represented per lane. Current gels permit the simultaneous electrophoresis of up to 64 samples in 64 different lanes. Different ddNTP-terminated fragments are detected by the irradiation of the gel lane by light followed by detection of emitted light from the fluorophore. Each electrophoresis step is about 4-6 hours long. Each electrophoresis separation resolves about 400-600 nucleotides (nt), therefore, about 6000 nt can be sequenced per hour per sequencer.
The use of mass spectrometry for the study of monomeric constituents of nucleic acids has also been described (Hignite, In Biochemical Applications of Mass Spectrometry, Waller and Dermer (eds.), Wiley-Interscience, Chapter 16, p. 527, 1972). Briefly, for larger oligomers, significant early success was obtained by plasma desorption for protected synthetic oligonucleotides up to 14 bases long, and for unprotected oligos up to 4 bases in length. As with proteins, the applicability of ESI-MS to oligonucleotides has been demonstrated (Covey et al., Rapid Comm. in Mass Spec. 2:249-256, 1988). These species are ionized in solution, with the charge residing at the acidic bridging phosphodiester and/or terminal phosphate moieties, and yield in the gas phase multiple charged molecular anions, in addition to sodium adducts.
Sequencing DNA with <100 bases by the common enzymatic ddNTP technique is more complicated than it is for larger DNA templates, so that chemical degradation is sometimes employed. However, the chemical decomposition method requires about 50 pmol of radioactive 32P end-labeled material, 6 chemical steps, electrophoretic separation, and film exposure. For small oligonucleotides (<14 nts) the combination of electrospray ionization (ESI) and Fourier transform (FT) mass spectrometry (MS) is far faster and more sensitive. Dissociation products of multiply-charged ions measured at high (105) resolving power represent consecutive backbone cleavages providing the full sequence in less than one minute on sub-picomole quantity of sample (Little et al., J. Am. Chem. Soc. 116:4893, 1994). For molecular weight measurements, ESI/MS has been extended to larger fragments (Potier et al., Nuc. Acids Res. 22:3895, 1994). ESI/FTMS appears to be a valuable complement to classical methods for sequencing and pinpoint mutations in nucleotides as large as 100-mers. Spectral data have recently been obtained loading 3×10−13 mol of a 50-mer using a more sensitive ESI source (Valaskovic, Anal. Chem. 68:259, 1995).
The other approach to DNA sequencing by mass spectrometry is one in which DNA is labeled with individual isotopes of an element and the mass spectral analysis simply has to distinguish the isotopes after a mixtures of sizes of DNA have been separated by electrophoresis. (The other approach described above utilizes the resolving power of the mass spectrometer to both separate and detect the DNA oligonucleotides of different lengths, a difficult proposition at best.) All of the procedures described below employ the Sanger procedure to convert a sequencing primer to a series of DNA fragments that vary in length by one nucleotide. The enzymatically synthesized DNA molecules each contain the original primer, a replicated sequence of part of the DNA of interest, and the dideoxy terminator. That is, a set of DNA molecules is produced that contain the primer and differ in length by from each other by one nucleotide residue.
Brennen et al. (Biol. Mass Spec., New York, Elsevier, p. 219, 1990) has described methods to use the four stable isotopes of sulfur as DNA labels that enable one to detect DNA fragments that have been separated by capillary electrophoresis. Using the α-thio analogues of the ddNTPs, a single sulfur isotope is incorporated into each of the DNA fragments. Therefore each of the four types of DNA fragments (ddTTP, ddATP, ddGTP, ddCTP-terminated) can be uniquely labeled according to the terminal nucleotide; for example, 32S for fragments ending in A, 33S for G, 34S for C, and 36S for T, and mixed together for electrophoresis column, fractions of a few picoliters are obtained by a modified ink-jet printer head, and then subjected to complete combustion in a furnace. This process oxidizes the thiophosphates of the labeled DNA to SO2, which is subjected to analysis in a quadrupole or magnetic sector mass spectrometer. The SO2 mass unit representation is 64 for fragments ending in A, 65 for G, 66 for C, and 68 for T. Maintenance of the resolution of the DNA fragments as they emerge from the column depends on taking sufficiently small fractions. Because the mass spectrometer is coupled directly to the capillary gel column, the rate of analysis is determined by the rate of electrophoresis. This process is unfortunately expensive, liberates radioactive gas and has not been commercialized. Two other basic constraints also operate on this approach: (a) No other components with mass of 64, 65, 66, or 68 (isobaric contaminants) can be tolerated and (b) the % natural abundances of the sulfur isotopes (32S is 95.0, 33S is 0.75, 34S is 4.2, and 36S is 0.11) govern the sensitivity and cost. Since 32S is 95% naturally abundant, the other isotopes must be enriched to >99% to eliminate contaminating 32S. Isotopes that are <1% abundant are quite expensive to obtain at 99% enrichment; even when 36S is purified 100-fold it contains as much or more 34S as it does 36S.
Gilbert has described an automated DNA sequencer (EPA, 92108678.2) that consists of an oligomer synthesizer, an array on a membrane, a detector which detects hybridization and a central computer. The synthesizer synthesizes and labels multiple oligomers of arbitrary predicted sequence. The oligomers are used to probe immobilized DNA on membranes. The detector identifies hybridization patterns and then sends those patterns to a central computer which constructs a sequence and then predicts the sequence of the next round of synthesis of oligomers. Through an iterative process, a DNA sequence can be obtained in an automated fashion.
Brennen has described a method for sequencing nucleic acids based on ligation of oligomers (U.S. Pat. No. 5,403,708). Methods and compositions are described for forming ligation product hybridized to a nucleic acid template. A primer is hybridized to a DNA template and then a pool of random extension oligonucleotides is also hybridized to the primed template in the presence ligase(s). The ligase enzyme covalently ligates the hybridized oligomers to the primer. Modifications permit the determination of the nucleotide sequence of one or more members of a first set of target nucleotide residues in the nucleic acid template that are spaced at intervals of N nucleotides. In this method, the labeled ligated product is formed wherein the position and type of label incorporated into the ligation product provides information concerning the nucleotide residue in the nucleic acid template with which the labeled nucleotide residue is base paired.
Koster has described an method for sequencing DNA by mass spectrometry after degradation of DNA by an exonuclease (PCT/US94/02938). The method described is simple in that DNA sequence is directly determined (the Sanger reaction is not used). DNA is cloned into standard vectors, the 5′ end is immobilized and the strands are then sequentially degraded at the 3′ end via an exonuclease and the enzymatic product (nucleotides) are detected by mass spectrometry.
Weiss et al. have described an automated hybridization/imaging device for fluorescent multiplex DNA sequencing (PCT/US94/11918). The method is based on the concept of hybridizing enzyme-linked probes to a membrane containing size separated DNA fragments arising from a typical Sanger reaction.
The demand for sequencing information is larger than can be supplied by the currently existing sequencing machines, such as the ABI377 and the Pharmacia ALF. One of the principal limitations of the current technology is the small number of tags which can be resolved using the current tagging system. The Church pooling system discussed above uses more tags, but the use and detection of these tags is laborious.
The present invention discloses novel compositions and methods which may be utilized to sequence nucleic acid molecules with greatly increased speed and sensitivity than the methods described above, and further provides other related advantages.