This invention relates generally to the field of nucleic acid hybridization on membranes. More particularly, this invention relates to a method for automated multiplex sequencing of DNA.
Large scale nucleotide sequencing initiatives, such as a project to sequence the human genome, have created a need for increased efficiency and productivity. J. Watson, 248 Science 44 (1990). Automation of the various steps involved in sequencing is one area in which gains in efficiency and productivity are being made.
Multiplex sequencing, one scheme for reducing the number of sequencing reactions and electrophoresis steps, involves the processing of a mixture of sequencing templates followed by sequential hybridization with selected probes. G. Church & S. Kieffer-Higgins, 240 Science 185 (1988); U.S. Pat. No. 4,942,124. In this method, many sequencing templates, each carrying a short known sequence or tag, are processed together. A single DNA preparation yields a mixture of templates. Sequencing reactions are performed on the mixture in the absence of any label, and the mixed reaction products are fractionated by electrophoresis, transferred to a membrane, and probed sequentially by hybridization with labeled oligonucleotides specific for each tag. Each hybridization step reveals the nucleotide sequence of one component of the template mixture. Between hybridizations the labeled probe is removed to permit the next hybridization without interference from the previous probe. The advantages of multiplex sequencing come from the parallel processing of template preparations and sequencing reactions, and the simultaneous electrophoresis of mixtures of templates. Multiplex sequencing can reduce the time, effort, and resources needed for these steps by about a factor of the number of different sequencing templates in the mixture.
The savings made in sequencing reactions and electrophoresis by multiplex sequencing are offset to some extent, however, by new steps that are unnecessary in conventional sequencing protocols. Hybridization of the membrane is an added step that is repeated with each specific probe. Fortunately, however, the hybridization process is automatable. P. Richterich et al., 7Bio/Techniques52 (1989). A remaining problem is the acquisition of sequence data in electronic form. Automated sequencing machines are available that detect fluorescently labeled sequencing products as they migrate through a gel. The data acquired in this way are then interpreted by an algorithm that yields a called sequence. Most large-scale sequencing efforts have turned toward such machines as the only way of obtaining sufficient efficiency.
Conventionally, hybridization probes have been labeled with radioisotopes. Although radioactive probes can detect minute quantities of DNA, they are hazardous and unstable, and high-resolution direct imaging of radioactive signals is not straight-forward. Non-radioactive methods of DNA detection have been developed in recent years. The most sensitive methods involve enzymatic conversion of substrates to colored, J. Leary et al., 80 Proc. Nat'l Acad. Sci. USA 4045 (1983), or chemiluminescent products, J. Voyta et al., 34 J. Clin. Chem. 1157 (1988); A. Schaap et al., 28 Tetrahedron Lett. 1159 (1987); I. Bronstein et al. 180 Anal. Biochem. 95 (1989). In this approach, an enzyme is linked to a probe, and an enzyme substrate that yields a colored or chemiluminescent product is applied to the membrane. After the enzyme acts on the substrate, the result is a pattern of color or light corresponding to the pattern of the target DNA on the membrane. Although colorimetric detection of sequence ladders has been achieved, P. Richterich et al, 7 Bio/Techniques 52 (1989), the inability to remove the colored product from the membrane precludes its use for sequential probing.
As currently practiced, automated DNA sequencing makes use of fluorescent labels for DNA detection. L. Smith et al., 321 Nature 674 (1986); W. Ansorge et al., 15 Nucleic Acids Res. 4593 (1987); J. Prober et al., 238 Science 336 (1987). In these methods fluorescence detection occurs while the DNA is in the gel. Under such conditions, a single fluorescent moiety per DNA molecule is sufficient for detection. Attempts at fluorescent detection in multiplex sequencing revealed a grossly inadequate limit of detection for DNA sequencing purposes. A. Karger et al., 206 Proc. SPIE 78 (1990). Background fluorescence from most membranes adds large quantities of noise, T. Chu et al., 13 Electrophoresis 105 (1992); U.S. Pat. No. 5,112,736, so that a much more intense signal is required to achieve an adequate signal-to-noise ratio than is required in a gel. Low fluorescence membranes, such as amine derivatized polypropylene (e.g., U.S. Pat. No. 5,112,736), are known, however such low flourescence membranes are restricted by a limit of detection about 100-fold too high for multiplex sequencing and the membranes are more fragile than nylon membranes.
Chemiluminescent hybridization signals are typically imaged by exposure to X-ray film although other methods are known, such as with a CCD (charge-coupled device) camera. U.S. Pat. No. 5,162,654. However the light output from chemiluminescence is quite low. Although enzymatic turnover results in many chemiluminescent molecules per target DNA molecule, at most one photon is emitted for each product molecule produced and in practice there is only about 1 photon emitted per 10.sup.4 molecules. Due to the low level of light emitted, a sensitive, low-noise detector, such as a cryogenically cooled CCD, is required for imaging, and a long exposure time is needed. A fully automated system based on chemiluminescence could be constructed, but it would be expensive and slow.
In the most straightforward operational mode, a CCD image is acquired as a snapshot, analogous to the operation of a photographic camera. The major advantages of digital imaging, in particular fast visualization, high sensitivity, quantitative imaging, and computer readable format, have been well documented. E. Ribeiro et al., 194 Anal. Biochem. 174 (1991); P. Jackson et al., 9 Electrophoresis 330 (1988); P. Jackson, 270 Biochem. J. 705 (1990); K. Chan et al., 63 Anal. Chem. 746 (1991); M. Lanan et al., 31 Biopolymers 1095 (1991); M. Lanan et al., 64 Anal. Chem. 1967 (1992); A. Karger et al., 1206 Proc. SPIE 78 (1990); K. Misiura et al. 18 Nucleic Acids Res. 4345 (1990); D. Pollard-Knight et al., 185 Anal. Biochem. 84 (1990); Z. Boniszewski et al., 11 Electrophoresis 432 (1990). When compared to other methods of visualization, however, such as autoradiography using isotope labels and X-ray film, the most obvious limitation of CCD imaging lies in the dimensions of the sensor arrays most commonly used in analytical applications. Their limited size rules out the recording of high-resolution electropherograms on a single frame. The large number of bands that can be resolved by high-resolution electrophoretic methods far exceeds the number of bands that can be adequately sampled on arrays having 512 to 768 CCD elements along their long axis, such as those referenced above.
One solution to obtaining a CCD image with adequate sampling over the entire surface of sequencing electropherograms is by manually merging partially overlapping individual frames on a computer screen using an image analysis tool. P. Jackson, 270 Biochem. J. 705 (1990). However, this procedure is time consuming and labor intensive, and the quality of the resulting composite image is compromised by discontinuities.
Another solution would be to use larger CCD arrays. CCD arrays consisting of 2048 elements square are commercially available, although at prices that are often prohibitive for analytical applications. Considering that several thousand data points need to be collected when several hundred bands are being separated, even a state-of-the-art, 4-megapixel CCD area array will fall short of the most demanding requirements of high-resolution separations. DNA sequencing, for example, requires sampling capability for well above 500 bands on a single lane, translating into much more than 2048 data points.
Continuous data acquisition using an area CCD can be achieved by operating the CCD camera in Time Delay and Integration (TDI) mode. Line scan CCD cameras are also available, but TDI mode provides greater sensitivity than line scan. TDI operation adds the capability of continuous data acquisition independent of the array length. This has been shown for two high-speed, fluorescence DNA sequencing formats: capillary electrophoresis, A. Karger, et al., 18 Nucleic Acids Res. 4955 (1991), and ultrathin slab gels, A. Kostichka et al., 10 Bio/Technology 78 (1992). TDI mode has also been used to monitor migrating fluorescent bands in capillary electrophoresis along the length of the column with a CCD camera. J. Sweedler et al., 63 Anal. Chem. 496 (1991). A TDI system for fluorescence detection on membranes is needed for automation of multiplex sequencing.
The task of converting relative band positions into nucleotide sequence is conceptually simple, however, the 1-3% error rate of human readers indicates that reading is more complex in practice. Band amplitudes and positions vary due to enzyme behavior and other biochemical factors, and instrumentation and handling factors, such as uneven temperature distribution. Band positions as a function of fragment size typically follow either quasi-logarithmic or constant spacing rules, depending on the instrumentation, but spatial jitter and position anomalies can be large enough to superimpose adjacent bands. Interlane band amplitudes vary, and intralane band amplitudes change both locally and along the length of the lane. Across a given electrophoretic gel, bands change width and may be tilted or take on complex shapes. Automated sequence readers must be able to deal with all this variation.