In an interactive system, a dialog is a series of inquires and responses between a computer and a user that allows the computer to obtain information from and deliver information to the user. Many techniques currently exist for specifying dialog control logic in voice and multimodal dialog systems. At the lowest level of representational sophistication are finite-state scripts that explicitly enumerate the various states and transitions. At a higher level of complexity are frame-based techniques. The metaphor behind frame-based dialog control logic is that of form-filling, that is, the system requires certain pieces of information from the user in order to accomplish some domain specific task (such as booking a flight, finding a restaurant, or finding out who you want to call on the phone). The advantage of frame-based techniques over finite-state scripts is that they enable a dialog designer to create a relatively complex dialog in a more compact format. A frame compactly represents a large number of states by eliminating much of the explicit process logic that is required in finite-state machines. This is because the fields of a frame can typically be filled in any order, and an interaction manager (IM) will use the current completion state of a frame in order to decide what remaining information it needs to get from the user. This typically leads to a more mixed-initiative, and flexible interaction than that obtained from a finite-state script.
A dialog system typically prompts the user for discrete pieces of information in a pre-determined order such as a credit card number followed by an expiration date. For the user, this can become quite cumbersome, especially when she is accustomed to providing multiple pieces of information in succession without the interruption of intermediary prompts. In addition, the caller may desire to provide the pieces of information in a different order than specified by the application. Mixed-initiative dialogs address both of these issues by allowing the flow of the call to be directed by the user as well as by the application.
Frame-based techniques are not the most powerful technique for specifying dialog control logic. Other techniques, such as plan-based and agent-based models, are more powerful However, frame-based techniques have the advantage of simplicity compared to these approaches, and are thus appropriate for specifying certain limited kinds of dialogs. Another reason for the current popularity of frames is the existence of a World Wide Web Consortium, Voice Extensible Markup Language (W3C VoiceXML 2.0) standard. The VoiceXML 2.0 standard adopts a frame-based approach. In the VoiceXML standard, frames come in two varieties, “forms”, and “menus”. An example of a VoiceXML form 100 is shown in FIG. 1. An example of a frame specified in the Motorola Portable Dialog Frame Language (MPD-FL) is shown in FIG. 2.
The embodiments presented below will be described in terms of frames of the sort shown in FIG. 1 and in FIG. 2.
FIG. 1 demonstrates that a form consists of several sub-parts, including a set of “fields”. These fields are analogous to variables, with mutable state. The goal of a dialog is to fill all of these fields by interacting with a user. How this interaction proceeds is determined by the structure of the form and by the “Form Interpretation Algorithm” (FIA) used to interpret the form. For VoiceXML, for example, the default is for the FIA to visit each field in the order in which it appears in the form. In FIG. 1, this means (1) color, (2) size, and (3) quantity. If a dialog designer wants to change the order in which the fields are visited, he can do one of two things. The designer can re-arrange the order of the fields, or he can specify explicit control logic in the form. Taking the latter option, the designer could, for example, specify a construct on the “color” field indicating that if “color” has the value “red”, then skip “size” and go directly to “quantity”. This is done using an ‘if-then’ construct supplied by VoiceXML. The same mechanism must be used if the designer wants to by-pass the FIA. If, for example, only 2 out of 3 fields need to be completed, the designer would need to add explicit control logic in order to have the FIA exit the frame when this contingency holds.
When a frame is large, containing a relatively large number of fields, the explicit control logic for the cases described above can involve considerable complexity, reducing the utility of using a frame-based language for specifying dialogs, and reducing maintainability and extensibility. To the extent that explicit control logic is being used, the frame-based language becomes less declarative, more procedural, and more equivalent to a lower-level finite state script.