Conventional methods for constructing and training statistical models for recognition and understanding involve collecting and annotating large speech corpora for a task. This speech is manually transcribed and each utterance is then semantically labeled. The resultant database is exploited to train stochastic language models for recognition and understanding. These models are further adapted for different dialog states. Examples of such methods are shown in U.S. Pat. Nos. 5,675,707, 5,860,063, 6,044,337, 6,192,110, and 6,173,261, each of which is incorporated by reference herein in its entirety.
This transcription and labeling process is a major bottleneck in new application development and refinement of existing ones. For incremental training of a deployed automated dialog system, current technology would potentially require transcribing millions of transactions. This process is both time-consuming and prohibitively expensive.