The present invention relates to a dialog processing system, a dialog processing method and a computer program for extracting mandatory utterances in a specific field from business conversation data, particularly, mandatory utterances for the sake of compliance in sales transactions and the like.
Recently, demands for telesales in a call center have been more and more increased, for example, for stock exchange, bank account management, insurance contract, telephone shopping and the like. Although a transaction through a telephone is simple and convenient for customers, the transaction also has many problems caused by the absence of a medium, such as a written document, which provides authentication information for certifying the transaction. For example, in the case of insurance contract or the like, problems may occur at the time of payment of insurance claim unless mandatory questions are surely checked. In addition, regarding stock exchange, if there is any difference between ordered contents actually listened to by an agent (a staff member responsible for answering incoming calls from customers in a call center) during a telephone conversation, and contents inputted to an order system by the agent, the difference will result in an erroneous order.
To avoid such problems, compliance check has been increasingly required on a transaction or the like through a telephone. Specifically, there has been required checking work for checking whether or not agents make mandatory utterances in a specific field, particularly, mandatory utterances for the sake of compliance in conversations of sales transactions (reconfirmation of ordered items, confirmation of contract, explanations of product risk and the like).
Although, in the checking work, recorded conversations are checked as to whether or not agents make mandatory utterances, it is extremely difficult to monitor all the conversations. This is because, for example, only a few of a hundred agents serve as managers in charge of the checking work. For this reason, it is the current situation that managers manually monitor a small amount of data mainly including data sampled from recorded conversations and conversations of agents on a black list.
To improve the current situation, an attempt was made to check conversations by use of a speech recognition technology as to whether mandatory utterances are made. For the purpose of performing the checking, a speech recognition system must learn utterance portions of mandatory information in conversations with the utterance portions manually labeled beforehand. Moreover, proper transcription data needs be prepared for improvement in a recognition rate. Since contents of utterances regarding mandatory information vary from industry to industry or from company to company, the manual work described above is required every time target data is changed. Moreover, in manual labeling of utterance portions of mandatory information, a range thus labeled may vary due to the manual work.
For automation of the manual labeling, for example, there has been disclosed a method for adding annotations to voice data to be processed through speech recognition, the annotations based on results of conversations by speakers at a call center. By using this method, a specified speaker repeats conversations made by unspecified speakers, and thereby the speech recognition is performed. Then, the results of the speech recognition are utilized for retrieval of sound clips or data mining (for example, Japanese Patent Application Laid-Open Publication No. 2003-316372).
The method disclosed in Japanese Patent Application Laid-open Publication No. 2003-316372 corresponds to labeling of a specific utterance in a conversation. One of similar methods is called dialog act classification, which has heretofore been performed for attaching any one of labels (questions, proposals or requests) to each utterance in a conversation (for example, Stolcke et. al (1998) Dialog Act Modeling for Conversational Speech (AAAI Symposium p. 98-105, 1998)). The heretofore performed dialog act classification is designed for an application such as an interactive voice response system used for ticket reservation and the like.
Moreover, there has been presented a technique of annotating only specific utterances in a conversation, not labeling all the utterances in a conversation (for example, Morgan et. al (2006) Automatically Detecting Action Items in Audio Meeting Recordings (SIGdial Workshop p. 96-103, 2006)). In this technique, discussions in a meeting are monitored to extract utterances regarding action items (decided items in the meeting).
However, even by using the method disclosed in Japanese Patent Application Laid-Open Publication No. 2003-316372, since the specified speaker selectively repeats the conversations of the unspecified speakers, the repeated conversation depends on selection by the specified speaker. Accordingly, it is undeniable that a variation may occur in the result of adding annotations. Moreover, the technique disclosed in Stolcke et. al (1998) Dialog Act Modeling for Conversational Speech (AAAI Symposium p. 98-105, 1998) is for giving appropriate responses by classifying utterances of a user in a specific situation, and needs creating learning data, for setting labels or classifying utterances, from data and response scenarios corresponding to a specific use situation. Furthermore, also for the extraction of the action items in Morgan et. al (2006) Automatically Detecting Action Items in Audio Meeting Recordings (SIGdial Workshop p. 96-103, 2006), an extraction module is constructed by use of feature quantities in previously given correct data. Providing the correct data allows the extraction module to use features obtained from the correct data. Accordingly, the correct data must be newly prepared manually and then be learnt every time data or fields of application are changed.