Conventional methods for constructing spoken language systems involve collecting and annotating large speech corpora for a task. This speech is manually transcribed and each utterance is then semantically labeled. The resultant database is exploited to train stochastic language models for recognition and understanding. These models are further adapted for different dialog states. Examples of such methods are shown in U.S. Pat. Nos. 5,675,707, 5,860,063 and 6,044,337, and U.S. patent application Ser. No. 08/943,944, filed Oct. 3, 1997, and Ser. No. 09/217,635, filed Dec. 21, 1998, each of which is incorporated by reference herein in its entirety.
This transcription and labeling process is a major bottleneck in new application development and refinement of existing ones. For incremental training of a deployed natural spoken dialog system, current technology would potentially require transcribing millions of transactions. This process is both time-consuming and prohibitively expensive.