In an analysis such as an LC/MS, GC/MS or CE/MS analysis, in which the liquid chromatography (LC), gas chromatography (GC), capillary electrophoresis (CE) or a similar technique for separating the components of a sample is combined with a mass spectrometry (MS), a large number of components contained in a sample are temporally separated, and a mass spectrometric data is obtained for each component. These techniques make it possible to efficiently analyze a sample in which various kinds of compounds are mixed. Therefore, in recent years, they have been applied to a wide range of fields. However, a problem exists in that the amount of data obtained by the measurement is considerably large. In particular, in the case of performing an evaluation in which the results of analyses of a large number of similar samples are compared, since a large amount of mass spectrometric data is obtained for each sample, the entire amount of data to be processed is so large that a comprehensive analysis is difficult. One conventional technique for dealing with this difficulty is a multivariate analysis, such as a discrimination analysis, principal component analysis or cluster analysis, which are all capable of analyzing a large amount of data in a relatively easy way.
For example, Non-Patent Document 1, Patent Document 1 and other documents disclose a technique in which mass spectrum data obtained for a plurality of samples are processed by a principal component analysis and the results are presented using the charts called the “scores plot” and “loadings plot.” Examples of commonly known multipurpose software products for conducting a multivariate analysis of a mass spectrometric data in the previously described manner include SIMCA-P+ produced by Umetrics AB, Sweden, and Pirouette® produced by Infometrix, Inc., USA, which are all easily available. However, for the measurement data to be read and processed by these multivariate analysis software systems, it is necessary to appropriately compile the measurement data to be analyzed (e.g. mass spectrum data or chromatogram data in the case of an LC/MS) into a table format, i.e. a set of numerical data arrayed in a one-dimensional or two-dimensional (rows and columns) form.
Conventionally, in an analysis using an infrared spectrometer (IR) or a nuclear magnetic resonator (NMR), it is common that the data collected from a large number of samples are processed and evaluated by a multivariate analysis. This is due to the fact that, in the case of the IR or NMR, the data obtained by a measurement of a sample is simpler than those obtained by the LC/MS or GC/MS. The result of an analysis by IR or NMR is simple and can be presented by one graph, i.e. a one-dimensional numerical data representing the strength of a signal with respect to a certain kind of physical quantity (i.e. the wavelength for IR or the chemical shift for NMR). Accordingly, when the results of analyses of a plurality of samples must be compared, the entire measurement data can be compiled into a two-dimensional table containing numerical values indicating the signal strengths arranged in two directions, one direction corresponding to the variable showing a sequence number or similar numerical values assigned to each sample and the other direction corresponding to the variable showing the aforementioned physical quantity.
By contrast, the measurement data obtained by an LC/MS, GC/MS or similar system are a collection of signal strengths obtained on two directions corresponding to two independent separation factors, i.e. time and mass-to-charge ratio (m/z). This means that these data themselves are in a two-dimensionally arrayed form. Therefore, when the results of the analyses of a plurality of samples must be compared, it is necessary to convert the two-dimensional array of data into a one-dimensional array and then compile the measurement data of the plurality of samples into one table.
One of the simplest methods for converting the two-dimensional array of data having the dimensions of time and mass-to-charge ratio into a one-dimensional array is to select one specific mass-to-charge ratio from a plurality of mass-to-charge ratios, and another method is to total the signal strengths in the dimension of mass-to-charge ratio. Both methods are intended for virtually fixing the variable corresponding to the dimension of mass-to-charge ratio to one value, which means removing the dimension corresponding to mass-to-charge ratio. Selection of one specific mass-to-charge ratio from a plurality of mass-to-charge ratios corresponds to selection of one extracted ion chromatogram (XIC) from LC/MS (or GC/MS) data. Totaling the signal strengths in the dimension of mass-to-charge ratio throughout the entire range of mass-to-charge ratios corresponds to obtaining a total ion current chromatogram (TIC) from LC/MS data. These methods have the advantages that an uncertainty depending on internal parameters used in the data processing operation for the conversion into a one-dimensionally arrayed form (as will be described later) is reduced, the process is so simple that it puts only a light load on the hardware system, and the processing time is so short that the throughput is high.
However, in the case of the TIC, the information in the dimension of mass-to-charge ratio is entirely lost. In the case of the XIC, although the information of one mass-to-charge ratio is retained, the information on the other mass-to-charge ratios is entirely lost. In any of these cases, it can be said that the obtained result is substantially deficient in the information in the dimension corresponding to mass-to-charge ratio. Such a loss of information in one of the two dimensions leads to the problem that, if the lost information contains some important information that characterizes the difference among the plurality of samples, no appropriate information for evaluating the similarity or difference of those samples can be obtained by multivariate analyses and the samples cannot be correctly compared.
On the other hand, Non-Patent Documents 2 and 3 disclose a technique in which a collection of data obtained by an LC/MS are subjected to a complex data processing operation, including the steps of peak detection and selection, noise removal and strength calculation (e.g. normalization), to remove and/or integrate unnecessary data so as to convert a two-dimensional array of data into a one-dimensional form, after which the measurement data obtained for each of the samples are compiled into a two-dimensional table to be subjected to a principal component analysis. Phenomenome Profiler™, a set of software tools for metabolomics analyses provided by Phenomenome Discoveries Inc., Canada, has the function of compiling a collection of data obtained by an LC/MS capable of MSn analyses into a two-dimensional table format by performing a data conversion process including the steps of peak detection, smoothing, calibration and so on.
However, such a complex data processing operation puts a heavy load on the hardware system, and therefore, requires high-performance CPUs and large-capacity random access memories. It also lowers the throughput of the process. The previously described data processing operation uses previously set operation parameters, and these parameters can cause a significant difference in the result of the multivariate analysis. The peak detection or similar processing causes the loss of the original information during the process, which may possibly prevent the difference of the samples from being correctly reflected in the results of the multivariate analysis. Due to these reasons, in some cases, it is impossible to correctly compare the samples despite the complicated data processing.
Furthermore, in the LC, GC or similar component separation technique, the point in time at which the same substance is eluted easily changes depending on the measurement conditions (separating conditions) or the state of the system. It is often the case that, although the measurement conditions are exactly the same, the elution time of the same substance varies when the measurement is actually performed a number of times. Such a variation in the elution time prevents correct comparison of the results of measurements of different samples. Therefore, in general, the time axes of the chromatogram data are adjusted so that the elution times of the same substance will be aligned with each other. This time-axis adjustment is achieved by shifting the chromatogram data to be adjusted along the time axis and/or by expanding or contracting the time axis. If a head or tail section of the chromatogram data is shifted as a result of such an adjustment, an instance of missing data occurs.
FIG. 9 is a conceptual diagram schematically illustrating the missing of data. In FIG. 9(a), the two chromatogram data A1 and A2 have their head and tail sections aligned and hence no missing data. In FIG. 9(b), since one of the data is shifted, the head section of A1 does not have a counterpart in A2, while the tail section of A2 does not have a counterpart in A1. In FIG. 9(c), since the time axis of one of the data is expanded and that of the other data is contracted, neither the head nor tail section of A1 has a counterpart in A2. Performing a multivariate analysis or similar processing with such a missing of data intact may possibly result in the incorrect recognition that there is a difference in the head or tail section of the data between the two samples.