The use of capillary electrophoresis (CE) has greatly improved DNA sequencing rates compared to conventional slab gel electrophoresis. Part of the improvement in speed, however, has been offset by the loss of the ability (inherent in slab gels) to accommodate multiple lanes in a single run. Highly multiplexed capillary electrophoresis, by making possible hundreds or even thousands of parallel sequencing runs, represents an attractive approach to overcoming the current throughput limitations of existing DNA sequencing instrumentation.
Excitation and Detection Geometry. Various excitation and detection systems have been developed to accommodate parallel arrays in capillary electrophoresis. Laser-induced fluorescence (LIF) detection has been the major method employed in the automation of DNA sequencing. The incident laser beam and the collected fluorescence light are typically perpendicular to each other in order to reduce background noise due to light scattering. On-column excitation and detection are generally performed from above the parallel array through transparent windows formed in the capillaries. For example, in one system a beam expander and a cylindrical lens are used to distribute the laser light into a thin line that intersects the axes of the capillaries, which are mounted in a grooved block so as to reduce cross-talk (K. Ueno et al., Anal. Chem., 66, 1424 (1994)). Although a low detection limit and uniform distribution of excitation intensities can be achieved with this system, a long laser line compared to the array width has to be used due to the Gaussian intensity distribution. Thus, half of the laser light in the array region is wasted due to the longer laser line and the presence of the spacer grooves. Cross-talk, though manageable, is still in the range of 10% of the observed signal.
On-column detection has also been carried out using axial-beam laser-induced fluorescence detection by inserting optical fibers into an end of each separation capillary (J. A. Taylor et al., Anal. Chem., 65, 956 (1993)). However, the intrusion of optical fibers into the separation capillaries affects the electroosmotic flow and increases the possibility for contamination and clogging. Furthermore, the detection limit is higher.
A type of side-entry excitation in a single capillary system has also been reported (R. N. Zare et al., U.S. Pat. No. 4,675,300 (1987)). In that system, an optical fiber is used to deliver coherent light to a translucent portion of a capillary, and fluorescence is detected through the translucent portion using a second optical fiber positioned perpendicular to the first optical fiber. This method suffers from excess stray light contamination and lower collimation efficiency.
Increased laser power is generally advantageous in providing a larger analyte signal. However, fluorophores are easily bleached, i.e., their fluorescing characteristic is destroyed by the laser beam, even at the milliwatt level, negating any increase in excitation intensity. Thus an LIF geometry that produces high resolution analyte signals while using a lower power laser (i.e., less than 50 mW) would represent a needed improvement in the art.
Detection Methods and Devices. Highly multiplexed CE imposes great demands on the detection system. For example, in one approach, a two-color confocal fluorescence scanner is employed for 25 capillaries (X. C. Huang et al., Anal. Chem., 64, 967 (1992)). A mechanical stage is used to translate the capillary array across the optical region. Since data acquisition is sequential and not truly parallel, its use for hundreds of capillaries is limited. To be compatible with the high speed provided by CE and the high throughput of a large capillary array, a fast, sensitive, image array detector is required.
Recently, charge-coupled devices (CCDs) have been used as two-dimensional (n.times.m) image array detectors to pursue high-speed, high-throughput DNA sequencing. For example, a multiple sheath-flow apparatus and four-color detection system are used by S. Takahashi et al. (Anal. Chem., 66, 1021 (1994)). Two laser beams are combined into one to cross the flow streams in an array of 20 capillaries in a line for excitation, and a CCD is used for simultaneous detection perpendicular to the excitation beam. Superior stray-light rejection can be achieved with this system. However, many challenges remain in scaling up from 20 to hundreds or thousands of capillaries. Misalignment of individual sheath flows, turbulence in the flow paths, improper matching of the laser beam waist over a long distance with the core diameters containing the eluted fragments, and the possible need to incorporate an extra space between the capillaries to accommodate the sheath flow are just a few of the problems associated with scale-up. Moreover, CCD detectors make major data analysis and storage demands on a system. CCDs read one array row at a time, and the time spent reading any particular row cannot be lengthened or shortened as desired in response to the amount of information in that row. A two-dimensional image array detection system that allowed random addressing and variable exposure times would significantly reduce data storage and analysis demands, and save considerable amounts of time as well.
Nucleotide Identification in DNA Sequencing Experiments--"Base Calling". It is unlikely that capillary electrophoresis will ever provide migration times that are reproducible enough among a group of capillaries to allow running four sets of fragments generated from a single DNA sample in a DNA sequencing analysis (one set of fragments for each for nucleotide bases A,T,C, and G) in separate capillaries. Thus, methods have been developed to distinguish the four bases run on a single capillary. The one-color, four-intensity scheme is least desirable because of difficulties in controlling the polymerase and maximizing the signal-to-noise ratio (S/N) (H. Swerdlow et al., Anal. Chem., 63, 2835-2841 (1991)). The two-color, two-intensity scheme provides the advantages of a simpler optical arrangement, good light collection, and a straightforward algorithm (R. A. Mathies et al., Anal. Chem., 64, 2149-2154 (1992); D. Chen et al., Nucl. Acids Res., 20, 4873-4880 (1992)). However, like the one-color, four-intensity scheme, this scheme also assumes uniform incorporation of label by the polymerase which is often an incorrect assumption.
The technology in most common use is therefore still the four-color scheme originally reported by F. Sanger et al. (Proc. Natl. Acad. Sci. U.S.A., 74, 5463-5467 (1977)). Many optical arrangements have been developed for base calling with four-dye labels (S. Carson et al., Anal. Chem., 65, 3219-3226 (1993); R. Tomisaki et al., Anal. Sci., 10, 817-820 (1994); A. E. Karger et al., Nucl. Acids Res., 19, 4955-4962 (1991)). The four standard dyes (FAM and JOE, which are fluorescein derivatives, and ROX and TAMRA, which are rhodamine derivatives, available as the PRISM dyes from ABD division of Perkin Elmer, Foster City, Calif.) are by no means spectrally distinct, either in excitation or in emission. Currently available commercial instruments therefore use fairly narrow interference filters for emission and two laser wavelengths for excitation. Still, a complicated set of emmission ratios have to be employed for base calling. Monochromator-based spectral identification of the labels in principle offers the best selectivity. However, one needs to disperse the total fluorescence over many pixels to obtain a spectrum. This adds to the amount of raw data acquired and increases the acquisition time and the data work-up effort. Monochromators also do not have the favorable f-numbers for light collection that simple filters possess.
The so-called two-color sequencing scheme developed at DuPont is actually a four-label method (J. M. Prober et al., Science, 238, 336-341 (1987)). The optics are simplified and the ratio-based base calling algorithm is fairly straightforward. However, the four labels have emission bands that are very closely spaced. Even though the intensity ratios (used for base calling) are relatively independent of the incorporation rate of the polymerase reaction, spectral interference and a low S/N (low transmission of the bandpass filters) can lead to ambiguities. Thus, while there exist various proven base calling schemes, there is much room for improvement in terms of accuracy, speed and simplicity.
Sieving Medium. Further gains in sequencing rates should be possible by optimization of the sieving medium, which is also known as a separation medium, sieving matrix, or separation matrix. Crosslinked polymers such as polyacrylamide have been used as matrices in CGE because of their known utility in slab gels for the separation of proteins and DNA. However, due to the instability over time, irreproducibility in the polymerization processes, and the fragile nature of the medium, crosslinked polyacrylamide in CE has not been reported to last for more than a few runs, and is therefore not suitable for large-scale DNA sequencing, especially in multiplexed operation (H. Swerdlow et al., Electrophoresis, 13, 475-483 (1992)). Thus, alternative sieving matrices are needed.
Low- to moderate-viscosity entangled polymers have been used to overcome some of the above problems. Unlike crosslinked gels, they are more easily replaceable and more stable for use at higher temperatures and greater electric field strengths. Linear polyacrylamide (0% C, i.e., where the percentage of crosslinker is 0%) has been used for the size separation of DNA or proteins (D. N. Heiger et al., J. Chromatogr., 516, 33-48 (1990); M. C. Ruiz-Martinez et al., Anal. Chem., 65, 2851-2858 (1993). In addition, methyl cellulose (W. A. M. Crehan et al., J. Liq. Chromatogr., 15, 1063-1080 (1992)), hydroxyalkyl cellulose (S. Nathakarnkitkool et al., Electrophoresis, 13, 18-31 (1992)), polyhydroxy- and polyethyleneglycol-methacrylate (T. Zewert et al., Electrophoresis, 13, 817-824 (1993)), and polyvinylalcohol (M. H. Kleemiss et al., Electrophoresis, 14, 515-522 (1993)) also have been employed for DNA separations.
Several important problems remain before entangled polymers can be routinely used for large-scale DNA sequencing. Replacement of the sieving matrix after every run has not been as easy as expected. The high pressures found to be needed to effect complete matrix replacement (e.g., 1.25.times.10.sup.3 pounds per square inch (psi). 6.46.times.10.sup.5 torr) in M. C. Ruiz-Martinez et al., Anal. Chem. 65, 2851-2858 (1993)) may preclude the use of otherwise simple, automated schemes for flushing out a large number of capillaries in an array. In addition, the preparation of the linear polyacrylamide polymer solutions is difficult to control and to reproduce. The polymerization process depends critically on oxygen content, temperature, time for complete reaction, reagent purity and contamination. While one day the Human Genome Project may drive commercial manufacturers to produce "standard" polymer mixtures, at the present time only a 10% solution (700,000 to 1,000,000 M.sub.n) and a solid (8,000,000 M.sub.n) polyacrylamide product is available.
A separate but related problem is the internal coating of the capillary tubes. Typically, the fused-silica capillaries used in DNA sequencing by CE have been pretreated with a bonded coating. These are mostly variations of a bonded polyacrylamide layer. The reason for the coating is to reduce or eliminate the electroosmotic flow (EOF) that exists in bare fused-silica capillaries. EOF can actually expel the sieving matrix from the capillary. Even when EOF is low, the fact that it is opposite to the migration direction of DNA fragments means long separation times. Since the net motion is dictated by (.mu..sub.DNA -.mu..sub.EOF), representing the corresponding difference in mobilities (.DELTA..mu.), the large fragments are affected much more severely than the short fragments. Where EOF is present, variability in migration times makes it difficult to analyze samples containing larger DNA fragments. Unfortunately, the coating designed to reduce EOF degrades with use. This is not surprising since polyacrylamide, when used as the sieving medium, also breaks down with time on interaction with the typical buffers used for DNA sequencing. There is definitely a need for better surface treatment procedures for the capillary columns to retain their integrity over many runs.