1. Field of the Invention
The present invention relates to a system, a method, and a program for processing voice data for a conversation between two persons. For example, the present invention relates to a system, a method, and a program for classifying a conversation between an agent and a customer at a call center.
2. Description of Related Art
At a call center, multi-person or two-way conversations occur over the phone between a call center agent (referred to as an agent) and a customer. A supervisor of the call center extracts conversations, including characteristic dialogues from the multi-person conversations, in order to improve customer services. Examples include a conversation in which the customer becomes angry or a conversation in which the agent makes an improper statement.
As an example, the supervisor of the call center can record the speakers' voices (referred to as voice, voices, or voice data) in a conversation between the agent and the customer in order to listen to the recorded voices, enabling the extraction of the conversation including the characteristic dialogue. Further, as another example, the supervisor of the call center can use a speech recognizer to convert voices in a conversation to text and read the converted text, enabling the extraction of the conversation including the characteristic dialogue.
However, large volumes of conversations occur at the call center. Therefore, it is difficult for the supervisor of the call center to listen to all the voices in many recorded conversations. Further, speech recognizers do not work accurately on voice over a phone as well as on non-phone conversations. Thus, it is difficult for the supervisor of the call center to have the speech recognizer convert all the voices in all the conversations to text with precision.
As mentioned above, it is very difficult to confirm all the conversations by either listening to the recorded voice or reading the text recognized by speech recognition. Thus, a conversation having a high probability of including a characteristic dialogue cannot easily be extracted from all the conversations at the call center.
As a method of analyzing a conversation between an agent at the call center and a customer without using speech recognition, techniques described, for example, in Patent Documents 1 to 3 are known. In Patent Document 1, a technique is described for setting a flag when the ratio of a voice period of time and a non-voice period of time between an operator and a customer is larger than a predetermined value (paragraph [0058]). According to this technique, if the ratio of the speech period between the operator and the customer has a notable difference, a warning can be given.
Patent Document 2 describes a technique for detecting the state of a conversation for which the sound pressure level is equal to or less than a reference value continues for a predetermined period of time for quantitative evaluation of proper speech. Patent Document 3 describes a technique for estimating customer's psychological state from a silent period and the number of suspended states. However, the techniques described in Patent Documents 1 to 3 cannot observe the entire conversation between the agent at the call center and the customer to extract a conversation having a high probability of including a characteristic dialogue.
[Patent Document 1] Japanese Patent Application Laid-Open No. 2007-33754
[Patent Document 2] Japanese Patent Application Laid-Open No. 2006-267465
[Patent Document 3] Japanese Patent Application Laid-Open No. 2002-51153
[Non-Patent Document 1] Etienne Marcheret et al., “The IBM RT06s Evaluation System for Speech Activity Detection in CHIL Seminars,” In Proc. MLMI, Springer Berlin/Heidelbelg, 2006