1. The Field of the Invention
The present invention relates to systems and methods for monitoring speech data labelers. More particularly, the present invention relates to systems and methods for using an interactively generated annotation guide to train and test speech labelers.
2. Introduction
Dialog applications are applications that are often used to automate the process of receiving and responding to customer inquiries. Dialog applications use a combination of voice recognition modules, language understanding modules, and text-to-speech systems to appropriately respond to speech input received from a user or a customer. Billing inquiries, information queries, customer complaints, and general questions are examples of the speech input that is received by dialog applications. The response of the dialog application to a particular speech input depends on the logic of the dialog application.
The development of a successful dialog application, however, is a time-consuming process and requires a significant amount of manual labor because of the nature of the tasks being performed. One of the tasks performed in the development of a dialog application is the generation of an annotation guide that is used to annotate or label raw speech data. The annotation guide is generally created by a user experience person (or other user) that is familiar with the purposes and goals of the dialog application. Becoming familiar with the purposes and goals of the dialog application is also a labor-intensive process.
Currently, the generation of an annotation guide requires the user experience person to examine the raw speech data and create the categories, call types, and examples that are usually included in the annotation guide. The annotation guide aids the development of a dialog application because the annotation guide is used by labelers to classify the raw speech data with the call types defined by the user experience person in the annotation guide.
After the annotation guide is developed, labelers begin using the annotation guide to label the speech data. Because the speech data may contain thousands of different utterances, labeling the speech data using the annotation guide is a labor-intensive process that is usually performed by more than one labeler. Unfortunately, human labelers do not always interpret the annotation guide in the same way or they may not understand the contents of the annotation guide. As a result, one labeler may classify a particular utterance as being of a particular call type while another labeler may classify the same utterance as being of a different call type. Labeling problems become more pronounced when labelers attempt to label utterances that do not clearly fit in a particular call type.
For example, an annotation guide may describe a Pay_Bill call type used to label utterances that suggest the customer wants to pay his or her bill. The following utterances from raw speech data, for instance, should be labeled with the Pay_Bill call type:                I want to pay a bill; and        I got mail and I have my credit card ready.        
The second example of “I got mail and I have my credit card ready” is a marginal example that is more difficult to classify that the first example of “I want to pay a bill.” It is possible that one labeler will correctly label the second example with the Pay_Bill call type while another labeler will incorrectly label the second example with a different call type.
The likelihood of a particular utterance being labeled incorrectly increases if the labeler is not trained or tested. Currently, speech labelers (annotators) manually use the annotation guide to label the speech data and this process is error-prone. The performance of the labelers cannot be tracked and it is difficult to determine whether similar utterances are being classified in the same way by different labelers.
The ability to properly label the raw speech data ultimately has a significant impact on whether the dialog application can respond to speech input appropriately. If incorrectly labeled or annotated speech data is used to train portions of the dialog application such as the natural language understanding modules, the dialog application will clearly not function properly and will frustrate customers. There is therefore a need for systems and methods to train and test the labelers to help insure that the utterances in the speech data are being labeled appropriately.