1. Field of the Invention
The present invention relates to speech recognition and more specifically to call classification of speech for spoken language systems.
2. Introduction
An objective of spoken dialog systems is to identify a speaker's intent, expressed in natural language, and take actions accordingly, to satisfy the intent. In a natural spoken dialog system, a speaker's utterance may be recognized using an automatic speech recognizer (ASR). The speaker's intent may be identified from a recognized sequence of words in the utterance by using a spoken language understanding (SLU) component. Identifying the speaker's intent may be framed as a classification problem for goal-oriented call routing systems. As a call classification example, consider the utterance “I would like to know my account balance,” in a customer care application. Assuming that the utterance is recognized correctly, the corresponding intent or the call-type would be Request(Account Balance) and the action would be learning the account number and prompting the balance to the user or routing this call to the Billing Department.
When statistical classifiers are used in such systems, they may be trained using large amounts of task data which is usually transcribed and then labeled (assigning one or more predefined call-type(s) to each utterance) by humans. This is a very expensive and labor-intensive process. The bottleneck in building an accurate statistical system is the time spent labeling in order to have high quality labeled data.
Typically, examples to be labeled are chosen randomly so that the training data matches the test set. In machine learning literature, learning from randomly selected examples is called passive learning. Recently, a new set of learning algorithms, in which a learner acts on the examples to be labeled, have been proposed. These new learning algorithms are called active learning. Using active learning, it is possible to get better performances using a subset of the training data.
The goal of active learning is to reduce a number of training examples to be labeled by selectively sampling a subset of the unlabeled data. This may be done by inspecting the unlabeled examples, and selecting the most informative ones with respect to a given cost function for human labeling. In other words, the goal of active learning algorithms is to select examples which will result in the largest increase in performance, and thereby reduce the human labeling effort. Selectively sampling utterances assumes that there is a pool of candidate utterances to label, which is much more than the capacity of the labelers. In a deployed natural dialog system, this is indeed the case, where a constant stream of raw data is collected from the field to continuously improve the performance of the system. Then the aim of active learning is to derive a smaller subset of all utterances collected from the field for human labeling.
A complimentary problem involves knowing how to intelligently exploit the remaining set of utterances that are not labeled by a human. Techniques for building better call classification systems in a shorter time frame are desired.