One of technologies analyzing a large amount of text is a text mining technology. The text mining technology is a technology which analyzes a feature or a tendency of a text set. A system to which the text mining technology is applied (hereinafter referred to as text mining system) calculates a feature degree of each element, like a word or a phrase, in each text within a text set, and identifies a distinctive element in the text set on the basis of the feature degree.
Here, the text set which is targeted for researching a feature or a tendency is described as “text set of interest” in the descriptions below. The text mining system uses, for example, a frequency at which each element appears in text, as a feature degree of each element. In this case, the element which frequently appears in the text set of interest is identifies as the distinctive element in the text set of interest. And the text mining system uses, for example, a statistical criterion as the feature degree. In this case, the text mining system can identify a meaningful element in the text set of interest.
One of the text mining technologies is described in Non-Patent Document 1. Non-Patent Document 1 discloses a technology which identifies an element, like a featured word or a phrase, in the text of the attentive category, when an input text set can be divided into two or more than two categories and in case of determining a focused category. In other words, the text mining system which is applied to the technology described in Non-Patent Document 1 identifies a distinctive element of the text set of interest to the set of the text belonging to the focused category as the text set of interest.
A specific method for determining a distinctive element is described. Initially, the text mining system described in Non-Patent Document 1 calculates the number of appearances of each element in the text of the focused category, and the number of appearances of each element in the text of a category other than the focused category. Then, the text mining system calculates a given statistical amount for each element. The given statistical amount is a statistical amount, for example, like SC (Stochastic Complexity) or ESC (Extended Stochastic Complexity), which becomes higher as the number of appearances in the text of the focused category increase, and becomes higher as the number of appearances in the text of the category other than the focused category decreases. Then, the text mining system understands the statistical amount as the feature degree of each element in the focused category, and identifies the element with the large statistical amount as the distinctive element of the focused category.
When analyzing the text set including plural topics using the text mining system, an analyst may target a specific topic (hereinafter referred to as “analysis target topic”) and perform text mining. FIG. 17 is an explanatory diagram illustrating telephone call text made from a dialog between a client and an operator in a call center. The telephone call text shown in FIG. 17 includes plural topics, like “opening, client identification, inquiry contents, procedure, and contact method”. For example, in order to analyze the inquiry contents in these telephone call text set, the analyst may perform text mining targeting the topic as “inquiry contents”.
In this case, initially, the analyst has to identify a part corresponding to the analysis target topic from each text in the input text set. A general topic analyzing system for identifying a part corresponding to the analysis target topic is described in Non-Patent Document 2. The topic analyzing system described in Non-Patent Document 2 divides text including plural topics into bodies having the same topic and allocates a topic to the body using a model which is modeled from an appearance degree of a word corresponding to the topic. The analyst classifies each text into a part corresponding to the analysis target topic and a part not corresponding thereto using this system. The analyst applies a general text mining technology to the part corresponding to the classified analysis target topic. As a result, it becomes possible to analyze the telephone call text shown in FIG. 17.
The text analyzing method is concretely described using FIG. 17. When performing text mining targeting the topic “inquiry contents”, the analyst initially applies the topic analyzing system described in Non-Patent Document 2 to each inputted telephone call text and identifies a part corresponding to the topic “inquiry contents”. As shown in FIG. 17, the inputted telephone call text is divided for each utterance, and an identifier identifying a topic and each utterance (speech index) is given to each utterance. After identifying the topic using the topic analyzing system, the analyst classifies the divided utterance into a part which is indicated by the utterance indices “6” to “15”, a topic of which is “inquiry contents”, and the other part. By performing the text mining on the classified telephone call text, the analyst can analyze the inquiry contents.
The text mining method to which is applied after the topic is identified is further described. After the topic is identified, the analyst can classify into a part corresponding to the analysis target topic and a part not corresponding thereto. How the analyst utilizes these parts for the text mining is different depending on a utilized text mining technology or a request of the analyst. A method for performing the text mining using the text mining system described in Non-Patent Document 1 is explained here. When the analysis target topic is targeted for the text mining, the text mining system described in Non-Patent Document 1 can perform two types of the text mining.
The first type of the text mining is a method in which an analysis target is limited to the part corresponding to the analysis target topic in the text. In other words, the text mining system described in Non-Patent Document 1 eliminates a part which does not correspond to the analysis target topic from the analysis target after identifying the corresponding part of the analysis target topic to each text of the inputted text set. The text mining system performs the text mining on only the corresponding part of the analysis target topic.
Suppose, for example, that a telephone call text set in a call center shown in FIG. 17 is an analysis target, and an analyst has an interest in only inquiry contents. In this case, the text mining system described in Non-Patent Document 1 targets only the part corresponding to the analysis target topic “inquiry contents” for analysis of the text mining. In other words, the text mining system described in Non-Patent Document 1 does not perform the text mining which targets the whole telephone call text shown in FIG. 17, but performs the text mining for only the part which is indicated by the speech indices “6” to “15” that is “inquiry contents”. Based on this, for example, when assuming that the text set of interest is “text set served by the operator A”, the analyst can analyze a distinctive element related to the inquiry contents in elements in telephone call text of the operator A compared with telephone call text of the other operators.
The second type of the text mining is a method for analyzing a distinctive element in a part corresponding to the analysis target topic, and a method using a topic which does not correspond to the analysis target topic for analysis. In other words, this text mining is an analysis in which a text set composed of the parts corresponding to the analysis target topic is regarded as the text set of interest to each text in inputted text set, after identifying the corresponding part of the analysis target topic. Based on this, for example, when the set of the telephone call text in the call center shown in FIG. 17 is the analysis target, the analyst can analyze a distinctive element compared with the part corresponding to the other topic in the elements of the part corresponding to the analysis target topic “inquiry contents”.
Further, in Patent Document 1, the mining device extracting distinctive expressions of a text set is described. In Patent Document 2, the technology calculating a value of a relatedness degree by increasing the relatedness degree of a keyword corresponding to a specific name is described.    [Patent Document 1] Japanese Patent Application Laid-Open No. 2006-031198 A (paragraph 0020)    [Patent Document 2] Japanese Patent Application Laid-Open No. 2003-016106 A (paragraphs 0009, 0033, 0034)    [Non-Patent Document 1] Hang Li and Kenji Yamanishi, “Mining from open answers in questionnaire data”, In Proceedings of KDD-01, pp. 443-449, 2001.    [Non-Patent Document 2] Rui Amaral and Isabel Trancoso, “Topic Detection in Read Documents”, In Proceedings of 4th European Conference on Research and Advanced Technology for Digital Libraries, pp. 315-318, 2000.