There is a case that opinions of many customers are collected in a market and a call center, the opinions are clustered into groups by aggregating similar opinions, and the group contents are analyzed every group. Also, in the acquisition of requests in a megaproject, similar requests are collected and clustered, and needs are extracted for every group. For example, the clustering can be carried out by calculating a similarity degree between every two documents based on an appearance frequency of each of words contained in the documents after morpheme analysis. For example, in Patent Literature 1, a method is described of expressing the document in the form of a vector by using the word appearance frequency and of calculating a similarity degree between the documents with a cosine similarity degree. Various methods of clustering are described in Non-Patent Literature 1.
Also, in Patent Literature 2, an apparatus is described in which the importance of customer needs is calculated by using a set key word and an evaluation value, when the customer opinions in a market and a call center should be analyzed. In Patent Literature 3, an apparatus is described in which a sentence/tag determination table is referred to extract a classification object sentence based on a key word, a terminology pattern description table is referred to extract a terminology, the expression difference of a word extracted through the morpheme analysis is eliminated, a classification pattern description table is referred to generate a classification pattern, and a classifying process is carried out based on the classification pattern.
Moreover, as a technique not using the morpheme analysis, a calculation method of a similarity degree between the objects based on Kolmogorov complexity is described in Non-Patent Literature 2. The similarity degree between the objects such as document data, image data, and time-series data can be calculated.