Mass spectrometry imaging (“rms imaging”) is a method for investigating the distribution of substances having a specific mass/charge ratio by performing mass spectrometry in a plurality of micro-regions within a two-dimensional area of a specimen such as a piece of biological tissue. Much hope is pinned on the use of mass spectrometry imaging in fields such as drug discovery, search for biomarkers and the determination of causes for different illnesses and diseases. Mass spectrometers that are used for performing MS imaging are generally referred to as imaging mass spectrometers since ordinarily a microscopy imaging is performed over an optional range of a specimen to determine the region where analysis would be performed based on the microscopy images and imaging mass spectrometry is then performed over the region that is so determined. For example, Non-Patent Literature 1 discloses the configuration of usual imaging mass spectrometers and examples of analysis performed using them.
With imaging mass spectrometers, mass spectrometry data (MS spectrum data and MSn spectrum data) is obtained from each of numerous measurement points (micro-regions) located within a two-dimensional area of a specimen. The amount of mass spectrometry data that is obtained from any one measurement point increases as mass resolution is increased. Furthermore, since the distance between the measurement points on the specimen affects spatial resolution, increasing the spatial resolution to obtain finer MS imaging images increases the number of measurement points, and this, in turn, increases the amount of mass spectrometry data that is obtained from the two-dimensional area that is being measured. Because of this relationship, the amount of MS imaging data that is obtained usually becomes voluminous. This means that, for a person performing the analysis, finding meaningful information from the voluminous amount of MS imaging data requires much time and labor. In general, because the strength information of specific mass/charge ratio at any one measurement point is represented in a MS imaging image as color information or shading information for individual pixels, the term “pixel” is used hereinafter to refer to a measurement point.
To solve the afore-described problem, attempts have been made to perform computer-aided statistical analysis on MS imaging data to extract meaningful information. Specifically, support vector machines (SVM) and cluster analysis, which is a method of multivariate analysis, are performed on MS imaging data obtained from biological specimen including cancerous tissues and normal tissues to see whether cancerous tissues and normal tissues can be identified in a specimen (see Non-Patent Literature 2 and 3).
With cluster analysis, the shape of the mass spectrum at each pixel is used as a basis for determining and categorizing, that is, clustering, whether a pixel represents a cancerous tissue, a normal site or some other site. When performing cluster analysis, a more detailed categorization based on the condition of the tissue/site is possible by using a greater division count for the tissue/site, that is, by increasing the total number of clusters and setting it to a large value. However, a downside to doing this is that the categorization can become more susceptible to the effects of for example, foreign components. Slight variations in foreign components may cause the same tissue to be categorized into different clusters, and the categorization tends to become inaccurate. An algorithm that is known with cluster analysis automatically calculates the total cluster count that is estimated to be the most appropriate based on, for example the k-means method, and uses the total cluster count for categorizing each of the pixels.
However, when normal tissue and pathological tissue that are present in a biological specimen have to be distinguished from each other, the difference in mass spectrum of the two is often not very clear because of the prevalence of substances that are commonly present in both. This prevents the reliability of the afore-described total cluster count that is automatically calculated from being very high. Because of this, the categorization result is usually no better than to serve just as a rough guideline, and ultimately, a person has to judge the correctness of the categorization results. If the categorization is inappropriate, the person has to manually specify the total cluster count and repeat the analysis. Furthermore, the judgment that has to be exercised owes much to experience and skill of the person performing the analysis, creating a factor that causes the final result to vary depending on the person performing the analysis.
On the other hand, if SVM is used, reference data (referred to as “training data”) that is representative of normal sites and cancerous tissues are stored in memory in advance, and individual pixels are categorized based on the reference data. This requires that the person performing the analysis select training data for the specific sites of interest representing cancerous tissues and the like. However, selecting the correct training data is not easy. Also, just as with cluster analysis, the correctness of the categorization result ultimately has to be judged by a person performing the analysis. Again, the accuracy of the analysis result ultimately depends largely on the skill and experience of the person performing the analysis.
Non-Patent Literature 1: Harada and 8 others, “Analysis of Biological Tissues Using Imaging Mass Spectrometer,” Shimadzu Hyouron, Vol. 64, 3rd and 4th issues, published Apr. 24, 2008, pp. 139-145
Non-Patent Literature 2: Gregor McCombie and 3 others, “Spatial and Spectral Correlations in MALDI Mass Spectrometry Images by Clustering and Multivariate Analysis,” Analytical Chemistry, 2005, Vol. 77, pp. 6118-6124
Non-Patent Literature 3, Kristina Schwamborn and 5 others, “Identifying Prostate Carcinoma by MALDI-Imaging,” International Journal of Molecular Medicine, 2007, Vol. 20, pp. 155-159