Systems with spoken dialogue interfaces have gained increasing acceptance in a wide range of applications. However, spoken dialogue interface systems may use restricted language and scripted dialogue interactions. In particular, spoken language dialogue systems may involve narrowly focused language understanding and simple models of dialogue interaction. Real human dialogue, however, may be highly context- and situation-dependent, full of ill-formed utterances and sentence fragments, and may be highly interactive and collaborative. For example, speakers may interrupt each other, finish each others' sentences, and jointly build contributions to the shared context.
Understanding language and modeling natural dialogue may be important in building friendly spoken-language interfaces, and may be critically important in settings where the user is focused on external tasks, such as flying a helicopter or driving a car. In such scenarios, users may not be able to plan their utterances ahead of time or “hold that thought” until an appropriate time. Instead, users may need to be able to interrupt the dialogue system and issue instructions that build on the context and situation. Conversely, the dialog system must interpret these contributions in context and should only interrupt the user when appropriate (such as, for example, in critical situations), and any questions from the system should be as focused as possible. Accordingly, speech interfaces in highly stressed or cognitively overloaded domains, i.e. those involving a user concentrating on other tasks, may require a more flexible dialogue with robust, wide-coverage language understanding
In the automotive industry, for example, dialogue systems may offer command and control for devices. However, these systems may rely on key word spotting techniques and finite state techniques for language understanding and dialogue management. These systems may also encounter difficulties associated with updating a new database or porting to a new device/application. Accordingly, due to the limitations of the technologies used by these systems, only constrained spoken language expressions may be handled. Furthermore, the more frequently occurring language phenomena, such as, for example, pronouns, ambiguity, and revisions, may not be processed properly.