1. Technical Field
The present disclosure relates to dialog management and more specifically to incorporating incremental speech recognition output into a dialog management system.
2. Introduction
Speech can provide a natural human-machine interface. Speech interaction rely on turn-taking behavior. Appropriate turn-taking is difficult for spoken dialog systems. Some speech interface systems use a straightforward approach to turn-taking. The dialog system plays a prompt, and upon detecting any user speech, the dialog system switches off the prompt and waits to reply until the user stops speaking. This simplistic approach starts and stops the system prompt mechanically, without considering what the user is saying. Turn-taking errors are a common cause of failed interactions with spoken dialog systems.
Incremental speech recognition can enable the dialog system to reason about what the user has said while the user is still speaking. However, the partial speech recognition results are often unstable and highly inaccurate. Instability refers to the tendency of the partial recognition result to revise itself as more speech is decoded. For example, “traveling from” would be a revision of “traveling to”. Inaccuracy refers to partial recognition that is sometimes semantically incomplete and may not reflect what the user intends to say. Conventional dialog management systems cannot handle revisions as the dialog state transitions following the system's reaction to the utterance, which leads to low recognition accuracy and improper turn-taking.