1. Field of the Invention
The present invention relates to spoken dialog systems and more specifically to methods of designing and implementing labeling guides associated with spoken dialog services.
2. Introduction
Building spoken dialog systems is a complicated and time-consuming process. The various modules necessary to carry on a meaningful conversation between a person and a spoken dialog system include an automatic speech recognition module, a spoken language understanding module, a dialog manager, and a text-to-speech language generation module. When a spoken dialog system for a particular domain is developed, the developers must train the various components to recognize and interact appropriately for the particular domain. For example, if the domain relates to an airline reservation service, each module must be trained to recognize and expect input from users related to air travel and reservations. The present invention relates to the process of training the spoken language understanding module of a spoken dialog system.
Most spoken language understanding (SLU) modules need some kind of internal representation of meaning that enables it to appropriately interpret and identify the meaning or intent of user input. The internal representation is typically organized into semantic classes. For example, to represent an entity type such as a person in a dialog, the person can be referred to in terms of her name (Betty), a pronoun (Her or She) or her relationship to others (Joe's manager). Thus three semantic classes can be derived from this entity type. For each spoken dialog application, the organization of the semantic classes for the potential entity types encountered in dialogues must be designed and then a large amount of training data is needed to build the semantic classifier models. For more information on semantic representations, see Huang, Acero and Hon, Spoken Language Processing, Prentice Hall, 2001, pages 867-880.
Producing the training data is a difficult and time-consuming process and is pivotal for the success of the application. Generating the training data requires recording a large number of user utterances, transcribing them and then labeling each one with appropriate semantic class or classes. Before labeling can be done, however, a person designs a set of semantic labels needed for the application. FIG. 1 illustrates a known process of generating a labeling guide. The set of semantic labels or tags used for the labeling guide is shown as step 102. The semantic label meanings along with both positive and negative examples are documented. The documentation is organized into a detailed labeling guide (104) that is then provided to labelers to follow during an implementation phase (106). Trained labelers then carry out the physical task of labeling the data. For every application, this process must be started from scratch (108), and labelers must be retrained.
The typical process of generating training data, designing semantic labels and manual labeling of training data is a very expensive process. In addition, the process also introduces the opportunity of labeling errors, at least in the early phase of the learning cycle. Because of the highly specialized nature of each spoken dialog system, the data labeled for one application cannot be used for any other application; and if the functionality of the application needs to be extended or modified, new labels must be designed and data must be labeled again. The ultimate goal of the semantic labeling process is to train the SLU module to determine the appropriate action or responsive statement based on the received user utterance.