The present application relates generally to chemical analysis, and more specifically to a system and method of analyzing a multi-dimensional data set having a first dimension corresponding to compound separation and a second dimension corresponding to compound spectra for characterizing compounds present in a sample mixture.
Sample analysis techniques are known that may be employed for identifying and quantitating one or more compounds present in a sample mixture. For purposes of illustration, a conventional system for performing such sample analysis includes a compound-separating unit for chromatographically or electrophoretically separating constituent compounds in a mixture, and a compound-identifying unit such as a mass spectrometer for identifying and quantitating one or more of the separated compounds. The spectrometrical detector typically represents the separated compounds in a chromatogram or an electropherogram as respective peaks with associated elution times. The mass spectrometer typically provides molecular masses-to-charge (m/z) of the compounds ions for the respective peaks to aid in identifying and quantitating the constituent compounds in the mixture.
One drawback of conventional systems for analyzing sample mixtures is that the analysis of data generated by the system often creates a limitation, especially when data generated from complex sample mixtures are analyzed. Another drawback of conventional sample analysis techniques is that noise in the data frequently makes the detection and identification of peaks, particularly, low intensity peaks, less reliable. Such noise may comprise chemical noise and/or random noise having a magnitude high enough to reduce significantly the peak Signal-to-Noise ratio (S/N), thereby making the detection of low intensity peaks problematic.
For example, a peak picking algorithm (the CODA algorithm) is known in which a mass chromatographic quality index (MSQ) is calculated as the inner product of an extracted ion chromatogram and its smoothed and mean-subtracted version. The higher the Signal-to-Noise ratio (S/N) in the initial chromatogram, the more it is like its smoothed version and the higher the MSQ. Only chromatograms with high MSQ are selected and combined to produce the Total Ion Chromatogram (TIC) with reduced noise and background. The CODA algorithm provides a peak selection technique in which only m/z values corresponding to an MSQ value greater than a predetermined threshold are selected. However, the CODA algorithm is not using the a priori information about the shape and width of chromatographic and MS peaks and is typically not robust to chemical and random noise.
The application of Sequential Paired Covariance (SPC) to the “de-noising” of CE-ESI-TOF data is also known. Electropherograms are reconstructed by considering the intensity of the signal equal to the covariance of two adjacent spectra. The correlation of the two adjacent spectra is employed as a measure of their similarity. Further, noise that is uncorrelated between successive spectra is suppressed. However, such de-noising during SPC can significantly alter the data. For example, information related to the position of peaks in the m/z dimension may be lost, and therefore the spectra may have to be re-analyzed.
Moreover, a Windowed Mass Selection Method (WMSM) is known in which a width for the window in the chromatographic dimension is specified (in terms of the number of spectra N) from the analysis of extracted ion chromatograms. For each extracted ion chromatogram, its mean value is calculated and subtracted, so that in the resultant function the chromatographic peaks are positive, while the noise may be both positive and negative. For each m/z value and for each spectrum, the product of N values of signal intensities (for the given value of m/z) is calculated. If there is at least one zero value inside the specified window, then the product is equal to zero. Accordingly, noisy regions are set to zero in the resultant signal matrix. A second window, typically much wider than the first window, is also specified, and the product of signal intensities inside this window is calculated. If the product for the second window is not equal to zero, then the resultant signal intensity is set to be zero. Peaks are thus eliminated that are much wider than the expected chromatographic peak. However, the WMSM technique is generally not very robust to the non-uniformities connected with limited ion statistics. For example, one missing data point (e.g., a negative spike) in a good chromatographic peak may eliminate this peak from the resultant signal.
It would therefore be desirable to have an improved system and method of analyzing sample data for characterizing compounds in a sample mixture. Such an improved system and method would avoid the drawbacks of the above-described conventional sample analysis techniques.