Liquid chromatography-mass spectrometry (LC-MS) is a well-known combined analytical technique for separation and identification of chemical mixtures. Components of a mixture pass through a chromatographic column at different rates, and the eluent is subjected to mass spectrometric analysis at known time intervals. Data are acquired as a series of time-dependent mass spectra, i.e., ion intensity at varying mass-to-charge ratios (m/z).
LC-MS data are typically reported by the mass spectrometer as a total ion current (TIC) chromatogram, the sum of all detected ions at each scan time. A TIC chromatogram of a proteolytic digest of human serum is shown in FIG. 1, with peaks representing separated components of the mixture eluting at the indicated retention times. Mass spectra corresponding to identified chromatographic peaks can provide chemical structure information about the peak constituents.
LC-MS has been used traditionally to study relatively simple samples, characterized by large available volumes and small numbers of samples and mixture components, leading to spectra containing few peaks. Recently, the method has been applied to proteomic and metabolomic profiling of complex biological mixtures. In such studies, many samples, each containing a large number of components, are analyzed rapidly, and large amounts of data are collected for mining and statistical analysis. While mass spectra acquired in traditional experimental studies can be interpreted manually, high-throughput studies require automated selection of peaks. Because spectra of complex biological samples tend to contain a large number of overlapping peaks and be very noisy, accurate automatic peak picking is a difficult problem to address.
State-of-the-art LC-MS instruments provide a basic peak picking function for both chromatograms and mass spectra. A noise threshold level is determined automatically, and local maxima in clusters of points above the threshold are identified as peaks. The operator can instead specify a threshold above which the system designates local maxima as peaks. In other methods, the base peak, i.e., the highest peak in the chromatogram, is identified, and all points whose intensities exceed a preset fraction of the base peak are identified as peaks. In practice, however, although the peak picking is performed automatically, it is not intended to be fully automated, but rather to serve as an aid to the operator in analyzing and interpreting the data. Note also that LC-MS data is two-dimensional; that is, a discrete data point (intensity) is obtained for varying values of two independent variables, retention time and mass-to-charge ratio (m/z). Commercially available peak picking methods are applied to one-dimensional data only, i.e., individual mass spectra or chromatograms. For example, Waters's MASSLYNX™ and Thermo Finnigan's XCALIBUR™ are LC-MS software packages that have a peak selection feature. FIG. 2 is an unprocessed base peak trace of the TIC chromatogram of FIG. 1 showing peaks selected by XCALIBUR™ LC-MS software package. The peak selection features of both software packages, however, appear to locate peaks along the time axis only.
A method for filtering and recognizing peaks in spectrometry data is disclosed in U.S. Pat. No. 5,995,989, issued to Gedcke et al. An average background signal level and an average deviation from the background are computed and used to define a local threshold value for each point. Points exceeding the threshold are assumed to be peaks or near peaks. This method was developed for mass spectra, and therefore provides a one-dimensional peak recognition algorithm only. Although it can be applied to two- and higher-dimensional data such as two-dimensional LC-MS data, e.g., by selecting peaks in each mass spectrum and then combining the resulting mass spectra into a total ion current chromatogram, the method's analysis remains one-dimensional. Such a method is limited because it does not take advantage of the information provided by the chromatography dimension. That is, what appears to be a peak in a single mass spectrum may be below the noise threshold in a corresponding mass chromatogram (ion abundance versus retention time for a particular m/z value).
More extensive and multidimensional peak picking algorithms have been developed for nuclear magnetic resonance (NMR) spectroscopy, in which manual peak selection is more time consuming and therefore provides a greater incentive for automation. For example, an automated peak picking algorithm for multidimensional NMR spectra is disclosed in R. Koradi et al., “Automated Peak Picking and Peak Integration in Macromolecular NMR Spectra Using AUTOPSY,” J. Magn. Reson., 13: 288-297 (1998). In one feature of the AUTOPSY (automated peak picking for NMR spectroscopy) algorithm, a different local noise level is defined for each multi-dimensional data point, and the point is retained only if its value exceeds the local noise level. A given point's local noise level is a function of the average noise level for all one-dimensional slices passing through the point.
Existing multidimensional peak picking algorithms, such as AUTOPSY, are generally not sufficiently effective at selecting peaks in LC-MS data to allow fully automated peak detection. Moreover, various aspects of LC-MS data render it not amenable to such analysis. For example, LC-MS noise arises from a variety of unrelated and, in some cases, poorly understood sources, and is therefore difficult to filter effectively using methods developed for noise with well-known distributions, such as those found in NMR data. In addition, it is not uncommon for one mass-to-charge ratio to be very noisy, while a single retention time shows little noise. Using the AUTOPSY algorithm, however, a point at the intersection of the noisy mass and low-noise retention time has a threshold level determined equally by the noise level within each one-dimensional slice. Such a level may be too low to exclude all of the noise. This algorithm and other available methods are therefore not optimal for peak peaking in LC-MS data.
There is still a need, therefore, for an automated peak picking algorithm for LC-MS and other multidimensional data.