1. Field of the Invention
This invention relates to the field of the determination of DNA sequences and the uses of automated techniques for such determinations.
The ability to sequence DNA has become a core technology in molecular biology, and is one of the great achievements of the last decade. The ease with which sequencing may be accomplished will substantially affect the rate of development of new drugs produced by recombinant technology, the creation of new biosynthetic pathways and the promise of genetic engineering through manipulation of microorganisms, plants and mammalian cells, and the better understanding of a wide range of disease processes.
Initially, researchers focused on reading the genetic code and the translation of the nucleotide sequence into protein amino acid sequence. This occurs by a process of DNA transcription into mRNA, and then actual synthesis of the protein on ribosomes. In eucarotic cells, large specific segments of mRNA, introns, may be excised during an intermediary processing step. Much of the chromosomal DNA is not translated, and its specific function is largely unknown. This "excess" DNA was first thought to be excess genetic material. However, as biologists begin to unravel the details of cell differentiation and the processes controlling gene transcription it is now felt that the specific sequence of these large portions of untranslated DNA may also provide important regulatory signals.
The potential applications which derive from DNA sequencing have only begun to be explored. On large scale, analysis of human chromosomal DNA is considered vital to understanding genetic diseases, AIDS and cancer because only subtle differences exist between normal DNA and DNA involved in pathological conditions. Serious consideration is now being given to the sequencing of the entire human genome--approximately 3 billion base pairs. The success of this project will depend on rapid, sensitive, inexpensive automated methods to sequence DNA.
The fundamental approach to determination of DNA sequence has been well established. Restriction endonucleases are employed cleave chromosomal DNA into smaller segments, and recombinant cloning techniques are then used to purify and generate analyzable quantities of DNA. The specific sequence of each segment can then be determined by either the Maxam-Gilbert chemical cleavage, or preferably, the Sanger dideoxy terminated enzymatic method. In either case, a set of all possible fragments ending in a specific base are generated. The individual fragments can be resolved electrophoretically by increasing molecular weight, and the sequence on the original DNA segment is then derived by knowing the identity of the terminal base in each fragment.
In its broadest aspect, this invention is directed to methods and reagents for sequencing DNA and other polynucleotides. In particular, this invention described reagents and methods for automating and increasing the sensitivity of both the Sanger and Gish and Eckstein procedures for sequencing polynucleotides. The methods of the present invention are based on mass spectrometric determination of the four component terminal nucleotide residues, where the information regarding the identity of the individual nucleotides is contained in the mass of stable nuclide markers.
B. Summary Of The Prior Art
In the Sanger dideoxy method (Proc. Nat. Acad. Sci., 74, 5463 (1977)), the DNA to be sequenced is exposed to a DNA polymerase, a cDNA primer, and a mixture of the four component deoxynucleotides, plus one of the four possible 2,3-dideoxy nucleotides. There is a competition for incorporation of the normal deoxy- and the dideoxy- nucleotide by the polymerase into the growing complementary chain. When a dideoxy nucleotide is incorporated, further chain extension is prevented. Since there is a finite probability that this chain terminating event may occur at each complementary site of the appropriate base, a mixture of all possible fragments ending in that dideoxy base will be generated. This mixture of fragments can be separated by size on gel electrophoresis. When the experiment is repeated with each dideoxy base, four mixtures of fragments, each terminating in a specific residue are produced. When this set of mixtures is chromatographed in four adjacent lanes, so that fragment lengths in the four mixtures can be correlated with each other, the sequence of the original DNA is determined by relating the fragment length to the identity of the terminating dideoxy base.
The position of the fragment in gel electrophoresis is usually revealed by staining or by autoradiography. In autoradiography methods, the fragments have typically been labeled with .sup.32 P or .sup.35 S radionuclides where either the DNA primer or one of the component deoxynucleotides have been tagged, and that label incorporated in a specific or random fashion. The potential number of residues which can be sequenced by this method is limited by the experimental ability to correlate the results of the four chromatograms (electrophoretograms).
An alternate method of detection was developed by the California Institute of Technology group (Smith, et al., Nature, 321, 674 (1986)) in which the identity of the terminal base residues is contained in a fluorescent marker attached to the DNA primer. If four fluorescent markers of different spectral emission maxima are used, then the four separate sets of polymerase fragments can be combined and co-chromatographed. This method is also disclosed in EPO Patent No. 87300998.9.
A second variation of the fluorescent tagging approach has recently been reported by the DuPont group (Science, 238, 335 (1987)) wherein a unique fluorescent moiety is attached directly to the dideoxy nucleotide. This may represent as improvement over the CalTech primer tagging approach in that a single polymerase experiment can now be run with a mixture of the four dideoxy terminating bases. However, one trade-off for this simplification is potential transcription errors by the polymerase, arising from mis-incorporation of the modified dideoxynucleotide base analogs.
These modified Sanger methods are an improvement over the original Sanger method in the extent to which DNA an be sequenced because the chromatographic ambiguities have been reduced. However, a number of limitations are associated with the use of fluorescent labels in these modified Sanger reactions. In particular, there are chromatographic differences among fragments arising from the unique mobilities of the different organic fluorescent markers. Moreover, there is an inability to distinguish individual fluorescent markers because of overlap in their spectral bandwidths. Finally, there is a low sensitivity of detection inherent in the extinction coefficients of the fluorescent markers.
All of the above variants of the Sanger method of sequencing have used slab gel electrophoresis to effect separation of the DNA fragments. The casting and loading of slab gels is a skilled but intrinsically manual operation. The only aspect of this process which has been automated with any success are those commercial devices which read the gel with some type of laser scanner/spectrophotometer.
Eckstein and Goody, Biochemistry, Vol. 15, No. 8, p. 1685 (1976), discloses a method of chemical synthesis for adenosine-5'-(O-1-thiotriphosphate) and adenosine-5'-(O-2-thiotriphosphate).
Eckstein, Accounts Chem. Res., Vol. 12, p. 204, (1978), discloses a group of phosphorothioate analogs of nucleotides.
Maxam and Gilbert, methods Enzym., 65:499-500 (1980), disclosed a method for DNA sequencing using chemical cleavage. In this method, each end of a nucleotide to be sequenced is labeled. This nucleotide sample is then broken preferentially at one of the nucleotides, under conditions favoring one break per strand. This procedure is then repeated for each of the other three nucleotides. The four samples are then run side by side on an electrophoretic gel. Autoradiography identifies the position of a particular nucleotide by the length of the fragments produced by cleavage at that particular nucleotide. This method suffers from the same drawbacks as the Sanger method by requiring long periods of autoradiography and restricting the length of fragments which can be sequenced.
Gish and Eckstein close an alternative method for sequencing DNA and RNA employing base specific chemical cleavage of phosphothioate analogs of the nucleotides which were incorporated in a cDNA sequence. Science, 240, 1520-1522 (1988).
Ornstein, et al., Biotechniques, Vol. 3, No. 6, p. 476 (1985), discloses the advantages of using .sup.35 S nucleotides rather than .sup.32 P labelling in sequencing DNA.
Japanese Patent No. 59-131,909 (1986), discloses a nucleic acid detection apparatus which detects nucleic acid fragments which are separated by electrophoretic techniques, liquid chromatography, or high speed gel filtration. Detection is achieved by utilizing nucleic acids into which S, Br, I, or Ag, Au, Pt, Os, Hg or similar metallic elements have been introduced. These elements are generally absent in the natural nucleic acids. Introduction of one of these elements into a nucleotide on a nucleic acid allows that nucleic acid or fragment thereof to be detected by means of atomic absorption, plasma emission or mass spectroscopy. However, this reference does not anticipate any application of the described methods or apparatus to the sequencing of DNA, such as by the Sanger method. Specifically, it does not teach that a plurality of specific isotopes may be used to identify the specific terminal nucleotide residues. Nor does it teach that by total combustion of DNA to oxides of carbon, hydrogen, nitrogen and phosphorus, the detection sensitivity by mass spectrometry for trace elements, such as sulfur which is not normally found in DNA, is vastly improved. The combustion step, which is one aspect of the present application, is essential to eliminate the myriad of fragment ions from DNA. These fragment ions would normally mask the presence of trace ions of SO.sub.2 in conventional mass spectrometry.
What this reference does disclose is that DNA may be tagged (by undisclosed means) with trace elements, including sulfur, as an aid to detection of DNA, and that these trace elements may be detected by a variety of means, including mass spectrometry.
Details of DNA sequencing are found in Current Protocol In Molecular Biology, John Wiley & Son, Ny., N.Y. Edited by F. M. Ansubel, et al., 1978, Chapter 7 which is incorporated herein by reference. Smith, et al., Anal. Chem., 1988, 60, 438-441, described capillary zone electrophoresis--mass spectrometry using an electrospray ionization interface and is incorporated herein by reference.