Electrophoresis is a well-known technique for separating small amount of macromolecules. Increasingly, electrophoresis has become an indispensable tool for the biotechnology and other industries and is used extensively in a variety of applications, including the separation, identification and preparation of samples of nucleic acids, proteins and carbohydrates. Of increasing interest in the broader field of electrophoresis is capillary electrophoresis and electrophoresis on a microchip.
Capillary electrophoresis is gaining more popularity as a powerful separation technology. This is largely due to the impressive benefits that capillary electrophoresis provides, including the high-efficiency separation of large variety of compounds, such as amino acids, peptides, proteins, polymerase chain reaction (PCR) products, oligonucleotides, carbohydrates, vitamins, organic acids, polymers, chiral drugs, dyes, surfactants, and the like. Since capillaries have large surface area relative to their small volume, resulting in high cooling efficiency, high voltages can be applied in analyzing small quantities of samples at high speed and in high resolution. Capillary electrophoresis represents a separation platform that is highly suitable for massively multiplexing and efficiently automating most of the separations typically attained by labor-intensive slab gel electrophoresis to reduce the time required to obtain results from hours to minutes. Separated components are quickly identified by online detectors during the analysis, in contrast to the time-consuming staining steps required for slab gel separations. Current electrophoresis systems collect time domain data that indicate the presence of separated species.
For the present discussion, a “capillary” refers to any tube that can be used in capillary electrophoretic operations. Any capillaries suitable for performing capillary electrophoresis may be used in the present invention. These include, but are not limited to, fused silica capillary tubes. The tubes may have inner channel diameters in the range of about 20 to 1000 μm. Preferably, the inner channel diameters of these capillaries range from about 25 μm to 150 μm. Since capillaries have large surface area relative to their small volume, resulting in high cooling efficiency, high voltages can be applied in analyzing small quantities of samples at high speed and in high resolution.
One of the most important applications of electrophoresis systems is deoxyribonucleic acid (DNA) sequencing, in which the sequence of the four bases within a particular sample of DNA is determined. A conventional capillary array electrophoresis system is configured to perform a high-throughput analysis on biological samples, e.g., DNA sequencing, using a highly sensitive laser-induced fluorescence detection method.
In four-color fluorescent sequencing, each sample fragment is tagged with one of four fluorescent dyes, sometimes referred to as “tags”. Each of the four tags preferentially binds to fragments terminating with one of four bases, i.e., guanine (“G”), adenosine (“A”), thymine (“T”), or cytosine (“C”). These samples are then excited with a laser beam, either while they are still migrating through the capillaries, i.e., on-column detection, or after they elute from output ends of the capillaries, i.e., sheath-flow detection, as described in U.S. Pat. No. 5,741,412 to Dovichi et al., causing the samples to emit fluorescence light. The emitted fluorescence light is detected as the tagged fragment migrates through a detection zone and subsequently analyzed. The identity of the fluorescent tag and the corresponding terminal base can be determined from the wavelength range of the fluorescence of the tag. The relative sizes of a series of fragments can be determined from the detection order because, in the absence of errors, smaller DNA fragments migrate faster and reach the detection zone prior to larger fragments. Accordingly, the sequence of bases in a DNA molecule can be determined from the fluorescence wavelengths of the tags bound to sequentially detected fragments.
It is known to then color separate these different dyes by the use of separate electrophoreses of reference DNA fragments containing the fluorescent dyes being used in the sample. For example, if an electrophoresis run uses four separate fluorescent dyes, one has to conduct four additional electrophoresis runs, one for each dye for calibration purposes. The data from these additional runs are used to obtain color calibration information, typically in the form of a calibration matrix which allows one to separate the contributions analyze the spectrum of fluoresced light resulting from the excitation of a species tagged with a particular a dye. The calibration matrix is then used for subsequent multiplex runs until the color separation ability is exhausted, or until a component of the system is altered rendering the calibration matrix invalid. This process can only be accomplished with the use of pure dye standards and requires dedicated tubes and wells for each of the dye standards, thus lowering efficiency, throughput, and accuracy.
Various attempts have been made to perform multiple capillary electrophoresis simultaneously. U.S. Pat. No. 6,027,627 to Li et al. discloses an automated capillary electrophoresis apparatus having a plurality of capillaries which are filled with migration medium and have first ends into which samples are injected and second ends from which components included in the samples are eluted.
U.S. Pat. No. 5,998,796 to Li et al. discloses a detector system suitable for use with an electrophoretic apparatus. FIG. 1 illustrates the disclosed detector system. In this figure, a laser 20 emits a beam 24 of light which impinges on a plurality of capillaries 22 aligned parallel to one another. The light impinges on each capillary, causing tagged DNA fragments, or other tagged species, within each capillary 22a to fluoresce. The fluoresced light 26 passes through a transmission grating beam splitter 38, a lens 32 and a filter 35 before it is received on a CCD detector array 31 belonging to a camera 30. Light detected at the camera 30 is then sent on to a computer 34 where it may be viewed, in an appropriate form, on a display 36. The CCD detector array 31 preferably includes 1024×256 pixels. The first pixel dimension, (1024 pixels) includes 96 parallel capillaries, each capillary being focused onto at least one of the 1024 rows. The number of rows per capillary can be increased by selecting a lens with a different focal length or changing other optical parameters. In this system, a fluorescence spectrum, e.g., as represented by the 1st order components, is created for each capillary and detected. The second pixel dimension (256 pixels) is focused on the spectrum spread by the transmission grating. The separated, fluoresced light from a given capillary 22a is detected by pixels of a particular column 39 of the array 31, with the 0th order component being detected by a first pixel 39a and the 1st order component being detected by at least one of a plurality of second pixels 39b spaced apart from the first pixel.
FIGS. 2a and 2b show the effect of a detector on incoming light 26 from tagged DNA samples of a single capillary. For simplicity, only the transmission grating beam splitter 38 and one pixel column 31a of the detector array 31, comprising a plurality of pixels 31b, is shown in FIG. 2a. The incoming light 26 is separated into a 0th order component 40 and a 1st order component 41. As shown in FIG. 2a, the 0th and 1st order components are spatially separated from each other, as they impinge on the pixel column 31a. This separation will subsequently allow one to use the intensities of both the 0th order and the 1st order transmitted incoming light components when performing subsequent analyses for identifying particular fluorophores, and hence, the corresponding nucleotides.
As is known to those skilled in the art of DNA sequencing using capillary electrophoresis, each of the four DNA nucleotides are typically tagged with one of four fluorophores which fluoresce in overlapping wavelengths. Thus, in FIG. 2a, the detected 1st order light 41 comprises four sub-bands, designated 41a, 41b, 41c, and 41d, each corresponding to a region along the column of pixels 31a, in which a particular one of the four fluorophores dominates.
FIG. 2b shows the relative intensity of fluorescence of the four fluorophores as a function of relative pixel number. Here, increasing pixel number corresponding to increasing wavelength. In FIG. 2b, curves 42a, 42b, 42c, and 42d correspond to the fluorescence emission spectra of the four fluorophores, each of which is shown to be dominant in a corresponding one of the four pixel regions 41a, 41b, 41c, and 41d of FIG. 2a. 
As stated above, in FIG. 2a, the pixel column 31a corresponds to the detector output for a single capillary. For that one capillary, data is available for a number of contiguous pixels, including a small number of pixels which have 0th order information, and a larger number of pixels which have 1st order information. This offers some flexibility in performing subsequent analysis to determine exactly which fluorophore is present at any given time.
The spectrum of interest should include the wavelengths of light at which the dyes are known to fluoresce. The spectrum of interest for each capillary is spread over P contiguous pixels and these are divided into R channels of Q contiguous pixels, R=P/Q. For example, in a system with 30 contiguous pixels, there may be 10 channels of 3 contiguous pixels. R should be as large, preferably greater than, the number of dyes M being used.
The detector then outputs the spectrum having R light intensity values for each capillary and each time that data is fed to a processor. The processor then maps the spectrum of R intensity values for each capillary, onto values which help determine what dye has been detected in a specific capillary. This is typically done by multiplying color calibration coefficients by the vector of intensity values, for each capillary.
The principle behind the color calibration coefficients is that a spectrum of received light intensities in each of the channels is caused by the spectrum of a single dye (tagging a corresponding base) weighted by the effects (color calibration coefficients) of the detection system.
If I0(n), I1(n), . . . I9(n) represent the measured intensities of the R=10 channels at the nth set of outputs from the CCD (after preprocessing including detection, binning, and baseline subtraction), B0(n), B1(n), . . . , B3(n) is a vector representing the contribution (presence 1 or absence 0) from the M=4 bases, and Cij are coefficients of a known 10×4 matrix which maps the bases onto the detected channels, having the relationship of Equation 1:                               (                                                                                          I                    0                                    ⁡                                      (                    n                    )                                                                                                                                            I                    1                                    ⁡                                      (                    n                    )                                                                                                                                            I                    2                                    ⁡                                      (                    n                    )                                                                                                      …                                                                    …                                                                                                          I                    9                                    ⁡                                      (                    n                    )                                                                                )                =                              (                                                                                C                    00                                                                                        C                    01                                                                                        C                    02                                                                                        C                    03                                                                                                                    C                    10                                                                                        C                    11                                                                                        C                    12                                                                                        C                    13                                                                                                                    C                    20                                                                                        C                    21                                                                                        C                    22                                                                                        C                    23                                                                                                …                                                  …                                                  …                                                  …                                                                              …                                                  …                                                  …                                                  …                                                                                                  C                    90                                                                                        C                    91                                                                                        C                    92                                                                                        C                    93                                                                        )                    ⁢                      (                                                                                                      B                      0                                        ⁡                                          (                      n                      )                                                                                                                                  B                      1                                        ⁡                                          (                      n                      )                                                                                                                                  B                      2                                        ⁡                                          (                      n                      )                                                                                                                                  B                      3                                        ⁡                                          (                      n                      )                                                                                            )                                              (        1        )            Equation 1 can be rewritten as Equation 2:I(n)=C B(n)  (2)Given a vector of intensities output by a CCD for each separation lane, the theory of determining the presence or absence of each of the M=4 bases from the R=10 wavelength channels is fairly well established. This is simply a particular case of an over-determined system in which a smaller number of unknowns is determined from a greater number of equations. After mathematical transformation, Equation 2 can be written as Equation 3:B(n)=(CTC)−1 CT I(n)  (3)where B0(n), . . . , B2(n) now represent the unknown values of the individual bases as functions of time index n, each value being reflective of the relative likelihood of the corresponding dye tagging that base being present; I0(n), I1(n), . . . I9(n) are the fluorescence intensities of the ten channels, and Cij's are the coefficients of wavelength i under known base j and where CT is a transpose of the matrix C and A=(CTC)−1CT is the pseudo-inverse of matrix C. While in the above analysis, C is a 10×4 matrix because a total of ten channels and four bases are used, in the general case, C is an R×M matrix wherein R≧M, and R and M are both integers greater than 2.
Typically, in prior art systems, the calibration matrix C is determined at the time the system is created. More particularly, the calibration matrix C is specific to a set of dyes used, and is constant for all separation lanes in a system. If such a prior art system is then modified, such as by upgrading to a new set of optical filters, the calibration matrix C needs to be re-calibrated.
One drawback of a constant calibration matrix is that the 0th order and 1st order spectral intensities from various capillaries in the capillary array do not fall on the same-positioned pixel as do the 0th order and 1st order spectral images from the remaining capillaries, but rather are offset by a skew of a single pixel or map onto more than one pixel. The binning process for 1st order intensities for these abnormal capillaries results in a spectrum which would be slightly different than if the binning process started one pixel over.
In general, different dye sets have different spectra. As a consequence, each dye set has a different calibration matrix. Consequently, a further disadvantage of using a single calibration matrix for a multi-lane separation system, is that one cannot run multiple dye sets in different separation lanes. U.S. patent application Ser. No. 09/676,526, filed Oct. 2, 2000, provides a method and apparatus for a multi-lane electrophoretic separation apparatus that simultaneously utilizes multiple calibration matrices to calibrate for different dyes used to tag migrating species. Each calibration matrix is calculated “on the fly” based on the data received from that electrophoresis run.