Methods for identifying an unknown compound in a sample using a gas chromatograph mass spectrometer (GC-MS) or a liquid chromatograph mass spectrometer (LC-MS) include a well-known method involving a database search using a database (may be called a library) in which mass spectra (including MSn spectra, where n is an integer of two or more) corresponding to a large number of known compounds are collected. Databases in which such mass spectra are collected range from general-purpose databases being exhaustive collection of the mass spectra of general compounds, such as the NIST database compiled by the National Institute of Standards and Technology (NIST) (US), and the Wiley database compiled by John Wiley & Sons, Inc., a publisher, to specified databases of compounds in specific fields or compounds for specific purposes, such as databases for agricultural chemical, medicine, and metabolite (see Patent Literature 1, etc.).
Such databases of mass spectra are created in general based on data obtained by actually measuring standard preparations of target compounds using a measurement machine. Usually, in collecting data, a mass spectrum includes unnecessary elements such as noise due to various factors. For example, in an LC-MS, ions originating from impurities included in a mobile phase used in an LC may appear on a mass spectrum in the form of the unnecessary elements. In addition, ions originating from impurities separated out from a column may also appear on a mass spectrum in the form of unnecessary elements. As described above, it is possible that unnecessary elements are included in a mass spectrum, but it is not desirable in terms of the reliability of analysis to edit the obtained mass spectrum before storing it to a database. Therefore, even when including unnecessary elements, such a mass spectrum is usually collected in a database as it is.
While there are a variety of algorithms for searching database based on mass spectra, algorithms commonly include two steps: extracting, from a database, a plurality of compounds the mass spectra of which have similar spectral patterns to that of an unknown compound to some extent, as compound candidates; and then calculating scores of strict degrees of match of the compound candidates in spectral pattern so that the compound candidates are presented to a user in a descending order of the scores. In performing such data processing, unnecessary elements included in a mass spectrum in the database can lead to the presentation of a false positive or a false negative, which may decrease the accuracy of a search.
On mass spectra of similar compounds having a common main skeleton, a common spectral pattern originating from the common main skeleton appears, which makes the mass spectra significantly similar to one another. In the case of a database in which a large number of such similar compounds are collected, for an unknown compound being one of such similar compounds, a large number of compounds similar to one another in mass spectrum are extracted as compound candidates. The spectral patterns of mass spectra of a plurality of compound candidates extracted in such a manner share a lot of common portions, and thus calculated scores hardly yield significant differences from one another, which makes it difficult to assess which of the compound candidates is a correct compound even when the scores are compared. In addition, this reduces the possibility that the correct compound is highly ranked. Furthermore, in identifying a compound using mass spectra, an analysis operator often determines the final result by visually confirming the match of a mass spectrum, but if the number of extracted compound candidates is too large, the visual confirmation operation poses a heavy workload on the operator and an operation error such as overlooking easily occurs.