The present invention relates generally to the investigation of the sequencing of DNA. More particularly, the present invention relates to a method of and apparatus for automating the sequencing of DNA which increases the rate at which DNA can be sequenced as well as improving the reliability and accuracy of the sequencing determination.
DNA sequencing is essential to the practice of biotechnology, genetic engineering and many other disciplines that rely on the need to determine the genetic information contained in DNA. The sequencing of DNA is the process of determining the sequence of nucleic acid bases that comprise a strand of DNA. There are four bases, denoted A for adenine, G for guanine, C for cytosine, and T for thymine, that comprise the DNA. The sequence of these bases uniquely describes each piece of DNA. Sequencing is a crucial step in genetic engineering and biotechnology, since it provides the precise code of genetic information contained in a sample of DNA.
DNA is double stranded and hence, the term base pairs is often used, since each base of one strand is opposed by its complimentary base on the other strand. There are an enormous number of bases that need to be sequenced in order to read a piece of DNA. Even a simple piece of DNA from a bacteria cell would likely comprise several thousand bases. The Human Genome Project, a large, multi-year, United States Government funded national project to sequence the DNA in humans, is attempting to sequence the approximately 10.sup.9 bases found in human DNA.
DNA sequencing is a very labor intensive process and, with the large amounts of DNA that are needed to be sequenced for the biotechnology industry to progress, methods and apparatus to automate this process are very desirable. Much has been written about DNA sequencing and genetic engineering and the reader is referred to the many references in this subject which will provide additional background information.
Two methods of DNA sequencing have been developed. The first is by Maxam and Gilbert, and is described in Proc. Natl. Acad. Sci. USA by A. M. Maxam and W. Gilbert, Vol. 74, page 560 (1977). The second method is described in Proc. Natl. Acad. Sci. USA, by F. Sangen, S. Nicklen and A. R. Coulson, Vol. 74, page 5463 (1977). Both of those methods involve performing a number of steps before the fragments of DNA are ready to be detected to yield a sequence. Those steps will not be reviewed here, as they are detailed in the two references noted above. Although the two techniques differ, eventually one arrives at four samples of DNA fragments that end at a given base. For instance, one sample contains fragments that end in base A; another other contains fragments that end in base G, and so forth.
The task is to separate those fragments by size and see what order they are in. If the shortest fragment in all of the four samples is one that ends in T, then the first base of the sequence is T. If the second shortest one ends in C, then the next base in the sequence is C, and so forth, until all of the fragments are separated in order of increasing length and the sequence is determined.
In order to perform this size separation and fragment detection, the first methods of manual DNA sequencing utilized polyacrylamide gel electrophoresis techniques to separate the fragments. Polyacrylamide gels have the ability to resolve fragments with a resolution of one base pair, and that resolution is necessary for sequencing. Each fragment is labeled with a radioactive element that typically gives off a beta particle, such as radioactive phosphorus, P-32. Each of the four samples are then separated in size in their own lane in the gel. The four lanes are typically side by side. After electrophoresis, a piece of x-ray film is placed next to the gel for a number of hours, often a couple of days, to expose the film with the radioactive emissions from the P-32 phosphorus. When developed, the fragments show up as dark bands on the film and the sequence can then be read from the order in which the bands appeared, from the bottom to the top of the film.
Automating DNA sequencing involves automating the process of detecting the fragments on the electrophoresis gel and then automatically determining the DNA base sequence from the sequence of detected fragments using the above algorithm implemented in a microprocessor. Because of the time needed to expose the x-ray film to the beta radiation of the P-32 phosphorus, and other considerations involving the use of radioisotopes, new methods of tagging and sequencing based on fluorescence were developed. See Biophysical and Biochemical Aspects of Fluoresene Spectroscopy, edited by T. Gregory Dewey, Plenum Press, 1997; "Large Scale and Automated Sequence Determination," by T. Hunkspillar, et al., Science, Vol. 254, pages 59-67 (1991) and "DNA Sequencing: Present Limitations and Prospects for the Future," by B. Barrell, the FASEB Journal, Vol. 5, page 40-45 (1991).
Fluorescence tagging of the fragments involves the attachment of a fluorescent compound, or fluorophore, to each fragment analogously to the attachment of the radioactive label to each fragment. These fluorescence labels were found to not adversely affect the process of gel electrophoreses or sequence.
Fluorescence is an optical method that involves stimulating the fluorescent molecule by shining light on it at an optical wavelength that is optimum for that molecule. Fluorescent light is then given off by the molecule at a characteristic wavelength that is typically slightly longer than the stimulation wavelength. By focusing the light at the stimulating wavelength down to a point on the gel and then detecting the presence of any optical radiation at the characteristic wavelength of light from the fluorescent molecule, the presence at that point of fragments of DNA tagged with that fluorescent molecule may be determined.
Two methods of implementing an automated DNA sequencing instrument are known in the art. One, reported by Smith et al., in Nature, Vol 321, pages 674-679 (1986), puts a different fluorescent tag on each of the four samples of fragments described above. Thus, the sample of fragments that end in the base A are tagged by one fluorophore; the sample of fragments that end in the base G are tagged by another fluorophore, and so on for the other two samples. Each fluorophore can be distinguished by its own stimulation and emission wavelengths of light.
In the Smith et al. method, all four samples are electrophoresed in the same lane together and the differences in their tags are used to distinguish them. That has the advantage that four separate lanes are not used, since the progression of fragments in different lanes is often not consistent with one another and difficulties often arise in determining the sequence as a result.
Another method, reported by Ansor et al., in J. Biochem Biophys. Methods, Vol. 13, pages 315-323 (1986) and Nucleic Acids Res., Vol 15(11), pages 4593-4602 (1987), uses one fluorescent tag for all fragments, but employs four separate lanes of gel electrophoresis in a manner that is similar to radioactive labeled sequencing. That approach has the potential disadvantage that four lanes, with different fragment migration rates caused by local temperature variations and other inconsistencies within the gel, could limit the reliability of the sequence determination.
Fluorescence tagging and the detection of natural fluorescence in molecules is a method of analytical chemistry and biology that is well known in the art. The methods described above have been developed for DNA sequencing by the creation of fluorescent tags that can be bound to fragments of DNA. The instruments used to detect fluorescence consist of the following parts. A light source with a broad optical bandwidth such as a light bulb or a laser is used as the source of the stimulating light. An optical filter is used to select the light at the desired stimulation wavelength and beam it onto the sample. Optical filters are available at essentially any wavelength and are typically constructed by the deposition of layers of thin film at a fraction of the wavelength of the desired transmission wavelength. The light that exits the optical filter is then applied to the sample to stimulate the fluorescent molecule.
The molecule then emits light at its characteristic fluorescent wavelength. This light is collected by a suitable lens and is then passed through a second optical filter centered at the characteristic wavelength before being brought to a detection device such as a photomultiplier tube, a photoconductive cell, or a semiconductor optical detector. Therefore, only light at the desired characteristic wavelength is detected to determine the presence of the fluorescent molecule.
The Ansorge method involves only a single light source, one stimulation optical filter, one fluorescent radiation optical filter and one optical detector. This apparatus is mechanically scanned across the gel to detect the presence of the fragment in the four lanes. Mechanical scanning is a disadvantage due to the slow rate at which the apparatus can be scanned, the alignment of the scanning and the repeatability and durability of scanning mechanisms.
The Smith method uses only one lane of electrophoresis, so it does not have the disadvantage of mechanically scanning the apparatus across the four lanes. However, there is a plurality of stimulation wavelengths and detection wavelengths that must be implemented. To do this, a mechanical wheel with four optical filters attached to it is rotated in the beam of the stimulation optical radiation. That selects in sequence the four stimulation wavelengths from the broad band optical radiation source. At the same time, a second mechanical wheel is fitted to the detection optical path to select the correct detection wavelength corresponding to the fluorophore being stimulated at the same time by the stimulation filters. Therefore, two mechanical rotating devices must be implemented, and operated in synchronism, in order to produce the correct result.
The complexities of such an electro-mechanical device with respect to maintaining synchronism and optical alignment are severe. This fact is pointed out in chapter 3 of the Dewey text, which is a review of automated DNA sequencing methods. A commercially available DNA sequencer that operates in a similar manner to that described above is available from Applied Biosystems, Inc., of Foster City, Calif., and is designated the ABI model 373A. Hunkpillar, in his April 1992 article in Science, points out that this DNA sequencer results in relatively long run times due to the mechanical operation of its four color optics. That is a big disadvantage in DNA sequencing which is overcome by the present invention.
Another prior art system for automated DNA sequencing was developed by DuPont of Wilmington, Del., and marketed as the Genesis 2000. It uses four fluorescent tags, one for each base, and runs them in a single lane. This system uses two fixed wavelength optical interference filters, each affixed to a photomultiplier tube light detector. The center wavelengths and bandwidths of these filters are chosen such that one of them is to the short wavelength side of the spectrum of wavelengths from the fluorescent tags and the other is on the long wavelength side. By looking at the ratio of the two detected signals, a determination can be made as to which base is being detected at a given time. For instance, if the signal from the short wavelength detector is much greater than the other one, it is inferred that the fluorescent tag with the shortest wavelength is present. Likewise, if the signal from the long wavelength filter is much greater, then the fluorescent tag with the longest wavelength emission is assumed to be present. If the two signals are close together in magnitude, then the fluorescent tag tending to that side of the two remaining ones in the middle, is assumed to be present. This technique, although workable, does not offer much resolution and it is reported by Dewey to no longer be commercially available.
Yet another prior art system for automated DNA sequencing has been developed by LI-COR, Inc., of Lincoln, Nebr., which is marketed as the Model 4000L Automated DNA Sequencer. This LI-COR system apparently operates using one florescent tag that emits radiation in the near infrared portion of the spectrum. The use of such a florescent tag is to produce less background florescence since the regular glass that the gel plates are made of, which is called "float glass" does not emit its own florescence in the infrared wavelength range. Float glass does, however, emit florescence in the visible wavelength range, which is a problem that has been noted in the prior art.
In addition, one method for reducing or eliminating the background florescence produced by the float glass when irradiated with the light in the invisible wavelength range is to "chop" the laser light. Such a method works because the florescence emissions of the constituents in the glass have long emission lifetimes, while the florescence tags themselves have short emission lifetimes. Thus, the interfering emissions from the float glass do not respond to the rapid chopping of the laser stimulation light and becomes a subtractible background signal. The present invention overcomes this background florescence problem in a more elegant and effective manner through the use of an acousto-optic tunable filter or acousto-optic modulator.
Recently, methods of preparing DNA fragments for sequencing using capillary hair techniques have been described. The present invention can be used with such methods of preparing DNA fragments for sequencing, as well as any additional methods or techniques of preparing DNA fragments, as long as fluorescent tags are used which can be caused to emit light in response to an impinging light beam.