In general, in order to acquire useful knowledge through text mining, it is essential to conduct analysis from a variety of perspectives. For example, in text mining, clustering is performed with respect to target text data based on a certain perspective, and whether or not the content of text in a portion divided by the clustering is characteristic is determined. If the result of determination indicates that there is a characteristic portion, it leads to the discovery of useful knowledge.
Patent Document 1 discloses a conventional text mining system for performing such text mining. The text mining system disclosed in Patent Document 1 uses data composed of a plurality of records as analysis target data. Each of the records in the analysis target data includes attribute values and text data.
Once an analyst designates a certain attribute (for example, a job category), the text mining system disclosed in Patent Document 1 first extracts applicable records from the analysis target data using attribute values of the designated attribute (for example, a student, an employee, etc.), for each one of the attribute values. Here, the extracted records are referred to as a “subset”.
Next, the text mining system disclosed in Patent Document 1 generates a plurality of text groups by applying text classification to text data in the analysis target data. Thereafter, for each one of the attribute values, the text mining system disclosed in Patent Document 1 indexes the association between the subset and the text groups, and displays information indicating the association between the subset and the text groups.
That is to say, according to the text mining system disclosed in Patent Document 1, by designating an attribute as a perspective of analysis, the analyst can review the association with the text groups for each one of the attribute values thereof. In other words, with use of such a text mining system, the analyst can set a commonly-known perspective and a perspective speculated from the analyst's experience or feelings, and conduct analysis based on the set perspectives.