An example of a useful application of text mining is to predict an event which occurs through a process recorded in a text, by use of appearance frequencies of keywords in the text and the distribution of the appearance frequency of each keyword. Here, as an example, consider a case of receiving a reservation for a rental car on the telephone. In this case, if a text indicating the telephone conversation record includes certain keywords a large number of times, it is possible to judge whether or not the conversation successfully comes to an agreement on the reservation. In this way, when reservations are received thereafter, it is possible to obtain information on what type of keyword is needed in conversations in order to improve the rate of success in reservations, or what type of keyword is effective for what type of customer. Then, the insight can be used to implement a business strategy.
This technology is described, for instance, in the followings:
T. Hisamitsu and Y. Niwa, “A Measure of Term Representativeness Based on the Number of Co-occurring Salient Words”, Proceedings of the 19th International Conference on Computational Linguistics (COLING), pp. 1-7, 2002;
Automatically Detecting Action Items in Audio Meeting Recordings, (W. Morgan, P-C. Chang, S. Gupta and J. M. Brenier), 7th SIGdial Workshop on Discourse and Dialogue, pp. 96-103, 2006; and
G. Zweig, et. al, “Automatic Analysis of Call-center Conversations, ICASSP, 2006
These will be described later.
Various keywords are included in a text targeted for text mining. Accordingly, even if the appearance frequencies of all the keywords are calculated, useful insight may not be obtained due to too much information. For this reason, in order to efficiently obtain useful information by text mining, it is desirable to calculate the appearance frequency or appearance distribution of keywords in each category by categorizing the keywords. For example, in the case of a call center to receive telephone inquiries on products in a manufacturing industry, a category of a product failure and a plurality of keywords belonging to the category are previously set, and the appearance frequency of the keywords in the category is used for analysis. If the category and the keywords belonging to the category are determined, a text can be automatically analyzed up to a certain point to find what event relates to each keyword (refer to “A Measure of Term Representativeness Based on the Number of Co-occurring Salient Words”).
Conventionally, a category and keyword belonging to the category in a text to be analyzed are carefully examined, discussed and determined by text-analysis experts. This approach is effective when a text to be analyzed is made according to a predetermined form such as a summary of a conversation. However, such a summary has to be manually created by, for example, an operator at a call center, and it thus requires time and costs. Accordingly, if a conversation record itself can be analyzed as an analysis target text through a text mining process, such time and costs can be cut down.
However, a conversation record itself includes not only the essential contents leading up to an event occurrence but also various pieces of information on greetings, repeating questions or misspeaking. Therefore, it is not easy even for the text-analysis experts to search for useful keywords which contribute to the analysis among those pieces of information. Moreover, in the case of a conversation record to be analyzed, while there are many similarities between a conversation record including an event occurrence and another conversation record including another event occurrence, only a slight difference may determine each event occurrence. This makes it more difficult to search for useful keywords for the analysis. If searching for the keywords is not possible, a category to which the keywords belong cannot be effectively determined.
As reference techniques, cited are “Automatically Detecting Action Items in Audio Meeting Recordings” and “Automatic Analysis of Call-center Conversations”. These techniques aim to find, from texts, characteristic parts that determine an event occurrence through a process recorded in the texts. Furthermore, the basic ideas are to learn characteristics of parts that determine an event occurrence in texts, from learning data. The learning data are the ones in which certain parts of the texts are previously associated with the characteristic parts that determine the event occurrence in the texts. According to the learning data, the characteristic parts themselves, words before and after the characteristic parts, the appearance frequencies of parts of speech, the pitch of a corresponding voice, and the like are learned. By using the result obtained by the learning, a newly inputted text is searched to find parts which are similar to the learned characteristics, and the found parts are outputted as parts which contribute to the analysis. These techniques are based on the existence of the learning data where the characteristic parts are manually determined. After all, the experts require enormous amounts of time and cost in order to appropriately and sufficiently prepare such learning data.
Hence, an object of the present invention is to provide a system, a program and a method, which can solve the above problems. The object is achieved by the combinations of the characteristics described in the independent claims in the scope of claims. Furthermore, the dependent claims stipulate further useful concrete examples of the present invention.