In recent years, the method of identifying microorganisms (bacteria and fungi) using matrix-assisted laser desorption/ionization time-of-flight mass spectrometer (MALDI-TOF MS) has been rapidly spread due to its economic efficiency and analytic speediness (for example, see Non Patent Literature 1 or Patent Literature 1). In particular, identifying microorganisms at the level of species or lower levels (e.g. strains) allows for the acquisition of information that is extremely useful in medical areas, e.g. for assessing the pathogenicity of microorganisms or identifying the source of infection. For such analyses, it is necessary to find a “biomarker”, i.e. a mass spectrum peak whose expression varies among microorganisms each of which belongs to a different group. In the following descriptions, a biomarker is simply called a “marker”, and a peak which can be used as a marker on mass spectra is called a “marker peak”.
To find a marker peak, it is normally necessary to perform a measurement using a mass spectrometer for each of the samples respectively derived from a plurality of groups, and perform a difference analysis of the obtained data for statistically analyzing the difference in peak intensity between the groups. An outline of a conventionally and commonly known procedure to search for a marker peak for identifying microorganisms using a difference analysis is as follows.
[Step A1] Samples are prepared in such a manner that each of which belongs to one of a plurality of groups that differ from each other in terms of the species/strain of fungi or culture conditions, with each group including Ng samples. The total number of groups is NG. The total number of samples is Ns.
[Step A2] A mass spectrum is acquired for each sample by performing a mass spectrometric analysis for the sample. The total number of mass spectra is the same as that of the samples, Ns.
[Step A3] A peak detection is performed for each of the Ns mass spectra to collect peak information, i.e. the mass-to-charge-ratio value and signal-intensity value of each peak. Then, for each sample, the peak information of the sample is organized into a peak list. A peak list is a collection of mass-to-charge-ratio values and corresponding signal-intensity values of the peaks sorted by mass-to-charge-ratio value. The total number of peak lists is also Ns.
[Step A4] Using one peak list corresponding to one sample as one column vector, a peak matrix Mp is created by arranging the peak lists as the column vectors in the row direction, i.e. in the horizontal direction, in such a manner that the signal-intensity values corresponding to the same mass-to-charge-ratio value are arrayed in the same row. In this peak matrix, the peak lists are sorted by group. The number of columns of the created peak matrix Mp is the same as the total number of samples, Ns. The number of rows is equal to the total number of peaks observed for all samples, Np (it should be noted that two or more peaks whose mass-to-charge-ratio values fall within a specific threshold are considered as the same peak and counted as one peak). FIG. 3B shows one example of the peak matrix.
[Step A5] Each row of the peak matrix Mp contains Ns signal-intensity values. Each of those values belongs to one of the NG groups. A univariate analysis is performed on those signal-intensity values to analyze whether or not there is a difference between the groups and calculate a p-value for each row. For the univariate analysis, the t-test or U test is popularly used for NG=2, while ANOVA is popularly used for NG≥3.
[Step A6] The p-values calculated in Step A5 are each compared with a previously determined significance level a to select a row (i.e. peak) which shows a significant difference. Each peak corresponding to a row which shows a significant difference is listed as a candidate of the marker peak.
Candidates of the marker peaks are subjected to further selections from other points of view unrelated to mass spectrometry, or more specifically, through the consideration of biological mechanisms, validity check based on additional experiments, or other processes. Each candidate having sufficient grounds for selection is judged to be a marker peak.