A. Field of the Invention
This invention relates to the field of signal detection and analysis of chromatographic migration patterns as commonly applied to mixtures of molecules. More specifically, this invention relates to a method and apparatus for signal detection and analysis of chromatographic migration patterns as applied to the determination of DNA sequences.
B. Description of Related Art
The ability to efficiently and accurately detect and analyze information-containing signals in chromatographic data is important for handling large amounts of data. Such an ability is particularly important for projects such as the Human Genome Project, where large amounts of information will be generated which must be analyzed and integrated to produce a representative sequence of an entire human genome. To expedite the analysis of DNA sequence information, numerous methods have been developed. For example, a U.S. patent to Clark Tibbetts (U.S. Pat. No. 5,365,455) discloses a method for the automated processing of DNA sequence data. This patent is incorporated by reference herein in its entirety. The Tibbetts' method derives information from informative variables obtained from the input data set. Such informative variables may include the relative intensities between adjacent signals, the relative signal spacing and pattern recognition factors.
The Tibbetts' method is limited, however, by the quality of the chromatographic data. Tibbetts' method relies to a certain extent on the reproducibility of chromatographic data to train the base identification ("calling") system. The apparatus generating the chromatographic data, therefore, needs to be consistent from run to run to avoid retraining the algorithm. Because chromatographic data frequently contain background noise and migration aberrations which obscure information-containing signals, analyses based on signal spacing may produce errors in signal identification. Similarly, because signal intensity often varies in an unpredictable manner, signal identification based on intensity may also result in significant identification errors.
A U.S. patent of Thomas Stockham and Jeff Ives (U.S. Pat. No. 5,273,632) discloses an alternate method for base identification using blind deconvolution ("BD"). This patent is incorporated by reference herein in its entirety. The method of Stockham and Ives uses blind deconvolution to deblur information-containing signals in chromatographic data. This method, however, is significantly limited in the following manner. First, it relies on data derived from scanned autoradiogram image data. Second, the method requires user input of the BD filter bandwidth and programmer alterations to various thresholds. Third, the Stockham and Ives method does not adequately deal with lane to lane mobility differences. Fourth, the insertion/deletion and correction logic was too simple. Fifth, the putative peak detection was based on thresholds, and therefore, could miss band detections when band amplitudes dropped below the threshold. Sixth, the method of Stockham and Ives lacked the ability to align and merge adjacent sample segments. Finally, that method lacked band quality measures useful in automatic data routing and or sequence assembly.