In recent years, text mining has been attracting attention as technology for extracting useful information from huge amounts of text data. Text mining is the process of dividing a collection of non-standardized text into words or phrases with use of natural language analysis methods and extracting feature words. The frequencies of appearance of the feature words and their correlations are then analyzed to provide the analyst with useful information. Text mining enables analysis of huge amounts of text data that has been impossible to achieve with manpower.
One exemplary application area for such text mining is free-response format questionnaires. In this case, text mining is performed on text data obtained by typing responses to a questionnaire or recognizing characters therein (see PTLs 1 and 2 and NPL 1, for example). Using the results of the text mining, the analyst is able to perform various analyses and verification of hypotheses.
Another exemplary application area for text mining is company call centers. Call centers accumulate a huge volume of audio obtained by recording calls between customers and operators, and a huge amount of memos created by operators with key entry or the like when answering calls. Such information has become an important knowledge source in recent years for companies to get to know consumer needs, what should be improved in their own products and services, and so on.
Text mining, when applied to call centers, is performed on either text data obtained by speech recognition of calls (speech-recognized text data) or text data obtained from call memos created by operators (call memo text data). Which text data is to undergo text mining is determined depending on the viewpoint of the analysis required by the analyst.
For example, the speech-recognized text data covers all calls between operators and consumers. Thus, when the purpose is to extract consumer requests for products and services, text mining is performed on the speech-recognized text data because in that case the utterances of all consumers need to be covered.
Meanwhile, the call memo text data covers a narrower range, but it includes matters determined as important by operators during calls and furthermore matters recognized or determined as necessary to record by operators who took cues from the contents of calls. Accordingly, text mining is performed on the call memo text data in cases where analyses are required to focus on additional information about operators, such as where information to be extracted is, for example, decision know-how of experienced operators that should be shared with other operators, or erroneous decisions made by newly-hired operators.
The speech-recognized text data, however, contains recognition errors in most cases. For this reason, when performing text mining on the speech-recognized text data, feature words may not be extracted precisely due to the influence of possible recognition errors. In order to solve this problem, it has been proposed (see PTL 3, for example) that text mining be performed using speech-recognized text data in which confidence has been assigned to each word candidate obtained by speech recognition (see NPL 2, for example). In the text mining described in PTL 3, correction based on the confidence is performed when the number of extracted feature words is counted, and accordingly the influence of recognition errors is reduced.
Now, the speech-recognized text data and the call memo text data mentioned in the above example of a call center are information obtained from the same event (telephone call) via different channels. Both pieces of information are obtained via different channels but have the same information source. Accordingly, it is conceivable that if text mining is performed making use of the characteristics of both information and using both information complementarily, more complex analysis would be possible than in the case where text mining is performed on only one of the text data pieces, or simply on each text data piece separately.
Specifically, the speech-recognized text data is first divided into portions that are common to the call memo text data, and portions that are inherent in call audio and are not described in the call memo text data. Similarly, the call memo text data is divided into portions common to the speech-recognized text data and portions that are inherent in call memos and not described in the speech-recognized text data.
Then, text mining is performed on the portions of the speech-recognized text data that are inherent in call audio. This text mining puts emphasis on the analysis of information that appears in call audio but is not included in the description of call memos. Through this analysis, information that should have been recorded as call memos but has been left out is extracted. Such extracted information can be used to improve description guidelines for creating call memos.
Subsequently, text mining is performed on the portions of the call memo text data that are inherent in call memos. This text mining puts emphasis on the analysis of information that appears in call memos but does not appear in the speech-recognized text data of call audio. Through this analysis, decision know-how of experienced operators is extracted more reliably than in the above-described case where text mining is performed on the call memo text data only. Such extracted decision know-how can be utilized as educational materials for newly-hired operators.
The above text mining performed on a plurality of text data pieces obtained from the same event via different channels (hereinafter referred to as “cross-channel text mining”) can also be used in other examples.
For instance, cross-channel text mining is usable in cases where the perception of a company is to be analyzed from reported content, and where conversations in communication settings such as meetings are to be analyzed. In the former case, text mining is performed on speech-recognized text data generated from the utterances of announcers or the like and on text data such as speech drafts or newspaper articles. In the latter case, text mining is performed on speech-recognized text data obtained by speech recognition of conversations among participants and on text data such as documents referred to by participants in situ, memos created by participants, and minutes of meetings.
Also, in cross-channel text mining, a target for mining does not necessarily need to be speech-recognized text data or text data created with key entry. A target for mining may, for example, be character-recognized text data obtained by character recognition of questionnaires, minutes of meetings or the like as mentioned above (see NPL 3).
Furthermore, it is important, when performing cross-channel text mining, to clearly divide common portions and inherent portions of one text data piece relative to another text data piece. This is because analysis accuracy will decrease significantly if such division is unclear.
Citation List
Patent Literature
PTL 1: JP 2001-101194A
PTL 2: JP 2004-164079A
PTL 3: JP 2008-039983A
Non Patent Literature
NPL 1: H. Li and K. Yamanishi, “Mining from Open Answers in Questionnaire Data”, In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 443-449, 2001
NPL 2: Frank Wessel, et. al. “Confidence Measures for Large Vocabulary Continuous Speech Recognition”, IEEE Trans. Speech and Audio Processing, vol. 9, No. 3, March 2001, pp. 288-298
NPL 3: John F. Pitrelli, Michael P. Perrone, “Confidence-Scoring Post-Processing for Off-Line Handwritten-Character Recognition Verification”, In Proceedings of the Seventh International Conference on Document Analysis and Recognition (ICDAR), vol. 1, August 2003, pp. 278-282