The present invention relates to DNA sequencing and genotyping, and more particularly, to DNA sequencers and genotypers that use optical fluorescence detection techniques.
The basic biological characteristics of a living organism are contained in its genes or genetic code. In humans, for example, a person""s biological characteristics are controlled by the genetic code contained in 23 chromosome pairs. Each chromosome contains differing genes.
The specific details of a genetic code are contained in long double helical molecules called deoxyribonucleic acid or DNA. The DNA consists of long sequence pairs of four nucleotides or bases: adenosine, cytosine, guanosine or thymidine, commonly referred to by the letters A, C, G, and T, respectively. In the double helix, the A and T nucleotides are complementary and the C and G nucleotides are complementary. Thus, the DNA molecules consist of two complementary strands that are bound together by the complements.
It is often advantageous to know the sequence of the DNA nucleotides associated with a particular gene. For example, genetic defects can be detected by analyzing an organism""s genes. The DNA nucleotides for several bacteria and viruses have been sequenced, and currently, sequencing of the entire human genome is in progress. The entire human DNA consists of approximately 3 billion nucleotides or base pairs.
Existing high speed DNA sequencers use electrophoresis gel techniques, in conjunction with fractioning enzymes and fluorescent tags or markers, to separate residual DNA sequence fragments as they travel through a gel. More specifically, each DNA fragment has an incrementally different molecular weight and size. Because the mobility of these DNA fragments through a gel is related to the fragment""s weight, structure, and charge, the differing fragments travel through the gel at differing speeds. Thus, the time it takes a fragment to travel through the gel, i.e., it""s mobility, relates to the fragment""s size and charge.
Generally, four fluorescent tags are used to visualize DNA fragments. These tags bind on the residual fragments in accordance with the exposed end base, if using dye terminator chemistry, or are attached to primers that are used to initiate the sequencing reaction, if using dye primer chemistry. The sequence is read by causing the fluorescent markers to fluoresce. The four fluorescent tags generally are selected to have a strong fluorescence peak that is separated from the strong fluorescence peak of the remaining tags. An optical instrument detects the emitted fluorescence signals.
Existing DNA Sequencers use an optical filter having a pass-band that is centered about the appropriate wavelength to distinguish between the dyes, and thus the fragments. The optical instrument typically includes a simple spectrometer or a filter wheel and a photomultiplier. The filter wheel has several colored filters, each filter passing light within a wavelength band corresponding to the spectral peak of one of the tags. A simple spectrograph has a wavelength-dependent light disperser such as a prism. The light disperser spreads, generally along a line, the different wavelengths of fluorescent light from the DNA fragments traveling in the gel. Four detectors are placed along the spreading line of the spectrograph at differing locations that correspond to the wavelengths associated with the fluorescent tags.
Fluorescent dyes have been found to be good fluorescent tags. Thus, for example, using dye primer chemistry, the tag usually used in association with the C base is fluorescein-5-isothiocynate (FITC), which has an emission or fluorescence peak at about 525 nanometers. The tag often associated with the T base is Texas Red, which has a fluorescence peak at about 620 nanometers. The tag often associated with the G base is Tetramethyl rhodamine isothiocynate (TRITC), which has a fluorescence peak at about 580 nanometers, and the marker usually associated with the A base is 4-fluoro-7nitro-benzofurazan (NBD-fluoride), which has a fluorescence peak at about 540 nanometers. Commercially, four universal primers, respectively labeled with dyes called FAN (C), TAMRA (G), JOE (A), and ROX (T), are available from Applied Biosystems, Inc. (ABI) of Foster City, Calif.
The fluorescent dyes indicated above are subject to bleaching which limits the excitation light""s power level and thus the intensity of the emitted fluorescence signal from the dyes. The upper limits on fluorescence intensity limit the signal-to-noise ratio (SNR) and eventually the system""s throughput.
Accordingly, there exists a need for a sequencing or genotyping system that has increased throughput and sensitivity over systems using four dyes that are distinguished by their respective fluoresce peaks. The present invention satisfies these needs.
A sequencing or genotyping system according to the present invention includes an imaging spectrograph that records the entire emission spectra across a plurality of lanes in an electrophoresis sequencing gel. The system includes spectral shape matching to improve dye identification, thereby allowing the use of dyes having nearly any emission spectra and allowing greater than four dye multiplexing.
In a first embodiment of the invention, the system includes a plurality of electrophoresis lanes. Each lane has a respective first and second end, and an electrophoresis medium between the first and second ends. Each lane is loaded with fluorescently-tagged charged molecules having differing mobilities and chemical properties. An electrical potential, of appropriate polarity, is applied between the first and second ends and causes charged molecules applied at the first end to travel toward the second end at a rate proportional to each molecule""s mobility such that the charged molecules are separated along the lane based on the molecule""s mobility. A xe2x80x9cread zonexe2x80x9d extends substantially along an image line and intersects the plurality of electrophoresis lanes near the second ends. The system also includes a light source and an imaging spectrometer. The light source illuminates the read zone with excitation light to cause the charged molecules to fluoresce and produce fluorescent light. The imaging spectrometer spectrally images the fluorescent light onto a two-dimensional imaging plane. The first dimension of the imaging plane is associated with a distance along the image line of the read zone and the second dimension is associated with the wavelength of the fluorescent light. The imaging spectrometer simultaneously images the fluorescent light onto the two-dimensional imaging plane without any scanning motions or delays.
The system may further include a camera having a two dimensional pixel array. The camera generates video signals based on the intensity of light incident upon the pixel array. A display may display a graph of the chemical properties of the molecules crossing the imaging line verses time.
In more detailed features of the invention, the imaging spectrograph further comprises a linear entrance aperture with discrete locations along the aperture corresponding to locations along the first dimension of the image plane. Further, a plurality of optical fibers couple the fluorescent light from the read zone to corresponding locations along the entrance aperture. Alternatively, the imaging spectrograph may include an optical lens system that directly images the read zone. Further, the system may include a processor that compares the detected fluorescent light received from molecules of a given mobility with reference spectral profiles for the fluorescent tags to identify the associated fluorescent tag and thus the molecule""s associated chemical property.
In other more detailed features of the invention, associated with genotyping or DNA fingerprinting, the charged molecules include at least 10 different genetic markers of a genome, and a fluorescent tag, having a unique spectral profile, is associated with each genetic marker. The processor compares the detected fluorescent light with the reference spectral profiles by calculating weighting factors, for each reference spectral profile, based on a deconvolution of the detected fluorescent light with the reference spectral profile for each of the fluorescent tags. Further, at least one fluorescent dye may be attached to charged molecules of a known mobility, to provide a mobility calibration reference. The processor calculates the mobility of any charged molecules of unknown mobility, based on the unknown molecule""s travel rate and the mobility calibration reference. Additionally, the fluorescent tags used in adjacent lanes may differ from each other. The processor tracks lane drift along the image line using the difference between the fluorescent tags in adjacent lanes.
In yet other more detailed features of the invention, the charged molecules are DNA fragments and each fragment is tagged with a fluorescent dye marker that identifies the fragment""s end base. Thus, the fluorescent tags are chosen to correspond to the four nucleobases; A, C, T and G. For example, the four primer dyes: FITC, TRITC, NBD-fluoride, and Texas Red, corresponding respectively to C, G, A and T. Preferably, the light source generates excitation light at wavelengths of about 488 nanometers and 514 nanometers.
The lanes are preferably located on a planar substrate, such as a gel, and the excitation light is a laser beam that travels along the image line to simultaneously illuminate the lanes. A mirror reflects the laser beam back through the read zone along the image line to increase the intensity and uniformity of the fluorescent light. A lens couples the laser beam from a laser to the gel to increase the intensity of the excitation light traveling along the image line.
Further, optical fibers may couple the fluorescent light from the image line to an entrance slit on the imaging spectrometer. A cylindrical lens focuses the fluorescent light from the imaging line onto ends of the optical fibers and a mirror is located behind the read zone such that fluorescent light traveling away from the fiber end is reflected back toward the optical fiber ends.
In still other more detailed features of the invention, the second dimension of the imaging plane corresponds to the spectral range from about 450 nanometers to about 900 nanometers and the wavelength within the spectral range of the second dimension so that resolution along the second dimension is less than four nanometers. Also, the excitation light is narrowband light having a wavelength within the spectral range of the second dimension so that excitation light scattered by the electrophoresis gel is imaged on a pixel array on the imaging plane. The electrical signals, generated by the pixel array based on the intensity of light collected by pixels corresponding to the wavelength of the excitation light, provide a monitor of the excitation light""s intensity.
The system may further include first and second electrodes for applying the electrical potential to the lanes at the first and second ends, respectively. The first and second electrodes extend across all of the plurality of lanes. Also, a plurality of loading electrodes may be situated in each lane near the first electrode for loading the charged molecules into the lanes.
The electrophoresis medium may be a gel having a thickness of 200 microns which is sandwiched between two glass plates. An index matching buffer may optically couple the light from the light source to the electrophoresis medium. The read zone may further include fluorescent markers at each end of the image line for indicating the read zone. Also, the plurality of electrophoresis lanes may include 384 separate parallel lanes. A heater may be thermally coupled to the electrophoresis medium for maintaining the medium within a predetermined temperature range.
The present invention is further embodied in a method for identifying molecules. In a most preferred embodiment, a method for sequencing DNA (or other nucleobase-containing molecules) is provided. For example, in a method of sequencing DNA, first, DNA fragments tagged with fluorescent dyes are produced. The dyes indicate the end base of the DNA fragment to which they are attached. Next, the DNA fragments are separated according to mobility using electrophoresis of the fragments on a plurality of electrophoresis lanes. The separated DNA fragments form fragment groups of slightly different mobility. Next, the fragment groups are excited with excitation light causing the fluorescent tags to fluoresce. Then, a hyperspectral image is formed of the separated DNA fragments. The hyperspectral image simultaneously covers all of the electrophoresis lanes and a broad spectral range. Next, the fluorescent dye associated with DNA fragments of a particular molecular weight is identified by fitting the spectra emitted by a fragment group with reference spectra associated with the fluorescent dyes.
In another method of identifying molecules tagged with fluorescent dyes, in accordance with the present invention, a hyperspectral profile is formed of a molecules"" fluorescence emission. The hyperspectral profile covers a wavelength range from 450 nanometers to 900 nanometers with a resolution of less than 5 nanometers. Next, the hyperspectral profile is convoluted with a reference spectral profile of each dye over the entire spectral range to generate a weighting factor for each dye. Finally, the contribution of a particular dye to the molecules"" emitted fluorescence is indicated based on the weighting factor.
The present invention is additionally embodied in a method for loading charged molecules of differing mobilities into a plurality of closely spaced electrophoresis lanes. Each lane has a first electrophoresis electrode located at a first end of the lane, a second electrophoresis electrode located at a second end of the lane, and a loading electrode located near the lane""s first end. First, a solution containing the charged molecules is applied over the loading electrode of the first lane. Next, a first voltage is applied to the first and second electrophoresis electrodes of the first lane. The first voltage has the same polarity as the charged molecules. The charged molecules are then loaded into the first lane by applying a large second voltage, having a polarity opposite the polarity of the charged molecules, to the loading electrode of the first lane. A third voltage is applied to the loading electrode of previously loaded lanes while the loading process is repeated for the remaining electrophoresis lanes. The third voltage is a reduced value of the first voltage. After loading the lanes, a fourth voltage is applied between the first electrophoresis electrode and the second electrophoresis electrode to cause the charged molecules to migrate through the electrophoresis lane at a rate proportional to the molecules mobility.
In this preferred method, the charged molecules are negatively charged DNA fragments. Further, the first voltage is between about xe2x88x921 and xe2x88x922 volts, the second voltage is between about +1 and +2 volts, the third voltage is between about xe2x88x920.1 and xe2x88x920.2 volts, and the fourth voltage is between about xe2x88x922000 and xe2x88x924000 volts.
Finally, the present invention is further embodied in a method of identifying fluorescently-tagged DNA base fragment groups. First, the fragment groups are excited with excitation light to cause a fluorescent tag associated with a respective fragment group to fluoresce and produce fluorescent light. Next, the fluorescent light emitted by each fragment group is detected over a sufficiently large wavelength range to produce a spectral profile of the fluorescent light from each fragment group. Next, the detected spectral profile from each fragment group is compared with reference spectral profiles associated with the fluorescent tags to generate a weighting factor. Finally, the fluorescent tags associated with each fragment group are identified based on the fragment group""s weighting factor. In a more detailed feature of the method, the wavelength range extends from about 450 nanometers to about 900 nanometers.