This invention relates to the field of the determination of DNA sequences and the uses of automated techniques for such determination.
The ability to sequence DNA has become a core technology in molecular biology, and has contributed greatly to the understanding of DNA structural organization and gene function. The facility with which DNA sequencing may be accomplished will substantially affect the rate of development of related technologies, including the production of new therapeutic agents, useful plant varieties and microorganisms via recombinant DNA technology and the understanding of human genetic disorders and pathology through gene mapping and chromosomal sequence analysis.
Initially, researchers focused on reading the genetic code and the translating of the nucleotide sequence into the amino acid sequence of a protein. This occurs by a process of DNA transcription into mRNA, and then actual synthesis of the protein on ribosomes. In eucaryotic cells, large specific segments of the initial transcript of mRNA, termed introns, are transcribed but are excised during an intermediary processing step. Much of the chromosomal DNA is not translated, and its specific function is largely unknown. This "intervening" or intron DNA was first thought to be excess genetic material. However, as biologists begin to unravel the details of cell differentiation and the processes controlling gene transcription it is now believed that the specific sequences of certain portions of some of these large regions of transcribed but untranslated DNA may also provide important regulatory signals.
The potential applications which derive from DNA sequencing have only begun to be explored. On large scale, analysis of human chromosomal DNA is considered vital to understanding human pathological conditions, including genetic disease, AIDS and cancer, because often only subtle differences, even single nucleotide substitutions, can lead to serious disorders. Serious consideration is now being given to the sequencing of the entire human genome--approximately 3 billion base pairs. The success of this project will depend on rapid, sensitive, inexpensive automated methods to sequence DNA.
The fundamental approach to determination of DNA sequence has been well established. Restriction endonucleases are employed to cleave chromosomal DNA into specific smaller segments, and recombinant cloning techniques are then used to purify and generate analyzable quantities of DNA. The specific sequence of each segment can then be determined by either the Maxam-Gilbert chemical cleavage, or preferably, the Sanger dideoxy terminated enzymatic method. In either case, a set of all possible fragments ending in a specific base are generated. The individual fragments can be resolved electrophoretically by molecular weight, and the sequence on the original DNA segment is then derived by knowing the identify of the terminal base in each fragment.
In its broadest aspect, this invention is directed to methods and reagents for sequencing DNA and other polynucleotides. In particular, this invention describes reagents and methods for automating and increasing the sensitivity of both the Sanger, Proc. Natl. Acad. Sci. USA, 74, 5463 (1977) and Gish and Eckstein, Science, 240, 1520-1522 (1988), procedures for sequencing polynucleotides. The methods of the present invention are based on mass spectrometric determination of each of the four component terminal nucleotide residues, where the information regarding the identity of the individual nucleotides is contained in the mass of stable nuclide markers.
2. Summary Of The Prior Art
In the Sanger dideoxy method (Proc. Natl. Acad. Sci., USA, 74, 5463 (1977)), the DNA to be sequenced is exposed to a DNA polymerase, a cDNA primer, and a mixture of the four component deoxynucleotides, plus one of the four possible 2,3-dideoxy nucleotides. The DNA to be sequenced is typically a single stranded DNA clone prepared in the phase vector M13, although Chen and Seeburg have disclosed a method for applying the Sanger method to supercoiled plasmid DNA (DNA 4:165-170 (1985)). In addition Innis et al., Proc. Natl. Acad. Sci., USA 85, 9436-9440 (1988) have disclosed a method for direct sequencing of chromosomal DNA amplified by the polymerase chain reaction. For any DNA template, however, the principle behind the dideoxy chain termination method remains the same. There is a competition for incorporation of the normal deoxy- and the dideoxy-nucleotide by the polymerase into the growing complementary chain. When a dideoxy nucleotide is incorporated, further chain extension is prevented. Since there is a finite probability that this chain terminating event many occur at each complementary site of the appropriate base, a mixture of all possible fragments ending in that dideoxy base will be generated. This mixture of fragments can be separated by size via gel electrophoresis. When the experiment is repeated with each dideoxy base, four mixtures of fragments, each terminating in a specific residue are produced. When this set of mixtures is chromatographed in four adjacent lanes, so that fragment lengths in the four mixtures can be correlated with each other, the sequence of the original DNA is determined by relating the fragment length to the identify of the terminating dideoxy base.
Maxam and Gilbert, Methods in Enzymology, 65, 499-500 (1980), disclosed a method for DNA sequencing using chemical cleavage. In this method, each end of a DNA fragment to be sequenced is labeled. This DNA fragment is then cleaved preferentially at one of the nucleotides, under conditions favoring one cleavage per strand. This procedure is then repeated for each of the other three nucleotides. The four samples are then run side by side on an electrophoretic gel. Autography identifies the position of a particular nucleotide by the length of the fragments produced by cleavage at that particular nucleotide. This method suffers from the same drawbacks as the Sanger method.
The position of the fragment in gel electrophoresis is usually revealed by staining or by autoradiography. In autoradiography methods, the fragments have typically been labeled with .sup.32 P or .sup.35 S radionuclides where either the DNA primer or one of the component deoxynucleotides have been tagged, and that label incorporated in a specific or random fashion. After fractionation of the fragment on acrylamide gels, the gels are used to expose films. This presents a number of difficulties. For example, the short half-life of .sup.32 P requires that the sequencing experiment be anticipated days in advance so that fresh label can be used. Additionally, the high energy beta radiation emitted by the .sup.32 P leads to scission of the phosphodiester linkages within the DNA fragments synthesized in the sequencing reaction and thus requires immediate fractionation of sequencing reaction products. The use of .sup.35 S (Ornstein, et al., Biotechniques, 3, 476 (1985), which has a longer half-life and less energetic emission somewhat ameliorates these problems, but requires much longer times of exposure to film for the development of a usable autoradiograph, often in the range of one to three days. Whichever radionuclide is used, the fact that a single type of label is used for each sequencing reaction requires that each set of reaction products be fractionated in a separate lane on the sequencing gel. Common problems in running sequencing gels include uneven heating and the presence of impurities, either of which can cause adjacent lanes on the sequencing gel to run in an uneven fashion making the comparison of fragment migration in adjacent lanes, and thus DNA sequence determination, difficult or impossible. The use of unstable radionuclides also poses a health risk to the investigator.
An alternate method of detection was developed by the California Institute of Technology group (Smith, et al., Nature, 321, 674 (1986)) in which the terminal base residues are labeled with a fluorescent marker attached to the DNA primer. In four fluorescent markers of different spectral emission maxima are used, then the four separate sets of polymerase fragments can be combined with co-chromatographed. This method is also disclosed in EPO Patent No. 87300998.9.
A second variation of the fluorescent tagging approach has recently been reported by the DuPont group (Science, 238, 336 (1987)) wherein a unique fluorescent moiety is attached directly to the dideoxy nucleotide. This may represent an improvement over the CalTech primer tagging approach in that a single polymerase experiment can now be run with a mixture of the four dideoxy termination bases. However, one trade-off for this simplification is potential replication errors by the polymerase, arising from mis-incorporation of the modified dideoxynucleotide base analogs.
These modified Sanger methods are an improvement over the original Sanger method in the extent to which DNA can be sequenced because the chromatographic ambiguities have been reduced. However, a number of limitations are associated with the use of fluorescent labels in these modified Sanger reactions. In particular, there are chromatographic differences among fragments arising from the unique mobilities of the different organic fluorescent markers. Moreover, there are difficulties in distinguishing individual fluorescent markers because of overlap in their spectral bandwidths. Finally, there is a low sensitivity of detection inherent in the extinction coefficients of the fluorescent markers.
All of the above variants of the Sanger method for sequencing have used slab gel electrophoresis to effect size separation of the DNA fragments. The casting and loading of slab gels is a skilled but intrinsically manual operation. The only aspect of this process which has been automated with any success is the reading of the gel by certain commercial devices with some type of laser scanner/spectrophotometer.
A labeling method is needed which eliminates chromatographic ambiguity by imparting to each sequencing reaction product its own specific tag, but in which this specific tag is "invisible" to the chromatographic apparatus, i.e., does not affect the chromatographic mobility of the different sequencing products differentially. Additionally, a label detection system is needed which is much more sensitive than the fluorescence system, and which can make distinction in labels based upon characteristics which separate them discretely, rather than by trying to distinguish between broad overlapping traits. Ideally, s stable, non-radioactive label would be used eliminating the short useful lifetime of the label and products containing the label, as well as potential health risks to investigators.
Eckstein and Goody, Biochemistry, 15, 1685 (1976), discloses a method of chemical synthesis for adenosine-5'-(O-1-thiotriphosphate) and adenosine-5'-(O-2-thiotriphosphate).
Eckstein, Accounts Chem. Res., 12, 204 (1978), discloses a group of phosphorothioate analogs of nucleotides.
Gish and Eckstein, Science, 240, 1520-1522 (1988), disclose an alternative method for sequencing DNA and RNA employing base specific chemical cleavage of phosphothioate analogs of the nucleotides which were incorporated in a cDNA sequence.
Japanese Patent No. 59-131,909 (1986), discloses a nucleic acid detection apparatus which detects nucleic acid fragments which are separated by electrophoretic techniques, liquid chromatography, or high speed gel filtration. Detection is achieved by utilizing nucleic acids into which S, Br, I, or Ag, Au, Pt, Os, Hg or similar metallic elements have been introduced. These elements are generally absent in natural nucleic acids. Introduction of one of these elements into a nucleotide of a nucleic acid allows that nucleic acid or fragment thereof to be detected by means of atomic absorption, plasma emission or mass spectroscopy. However, this reference does not suggest or disclose any application of the described methods of apparatus to the sequencing of DNA, such as by the Sanger method. Specifically, it does not teach that a plurality of specific isotopes may be used to identify the specific terminal nucleotide residues. Nor does it teach that by total combustion of DNA to oxides of carbon, hydrogen, nitrogen and phosphorus, the detection sensitivity by mass spectrometry for trace elements, such as sulfur which is not normally found in DNA, is vastly improved. The combustion step, which is one aspect of the present application, is essential to eliminate the myriad of fragment ions from DNA. These fragment ions would normally mask the presence of trace ions of SO.sub.2 in conventional mass spectrometry. What this reference does disclose is that DNA may be tagged (by undisclosed means) with trace elements, including sulfur, as an aid to detection of DNA, and that these trace elements may be detected by a variety of means, including mass spectrometry.
Details of DNA sequencing are found in Current Protocol In Molecular Biology, John Wiley & Son, N.Y., N.Y., F. M. Ansubel, et al., eds., (1987), Chapter 7 of which is hereby incorporated by reference. Smith, et al., Anal. Chem. 60, 438-441 (1988), describes capillary zone electrophoresismass spectrometry using an electrospray ionization interface and is thereby incorporated by reference.