The present invention relates to methods and systems for defining and handling user/computer interactions. In particular, the present invention relates to dialog systems.
Nearly all modern computer interfaces are based on computer driven interactions in which the user must follow an execution flow set by the computer or learn one or more commands exposed by the computer.
In other words, most computer interfaces do not adapt to the manner in which the user wishes to interact with the computer, but instead force the user to interact through a specific set of interfaces.
New research, however, has focused on the idea of having a computer/user interface that is based on a dialog metaphor in which both the user and the computer system can lead or follow the dialog. Under this metaphor, the user can provide an initial question or command and the computer system can then identify ambiguity in the question or command and ask refining questions to identify a proper course of action. Note that during the refinement, the user is free to change the dialog and lead it into a new direction. Thus, the computer system must be adaptive and react to these changes in the dialog. The system must be able to recognize the information that the user has provided to the system and derive a user intention from that information. In addition, the systems must be able to convert the user intention into an appropriate action, such as asking a follow-up question or sending an e-mail message.
Note that the selection of the proper action is critical in that the quality of the user experience is dictated in large part by the number of questions that the system asks the user and, consequently, the amount of time it takes for the user to reach their goal.
In the past, such dialog systems have been created through a combination of technologies. Typically a stochastic model would be used to identify what the user has said. Such models provide probabilities for each of a set of hypothesis phrases.
The hypothesis with the highest probability is then selected as the most likely phrase spoken by the user.
This most likely phrase is provided to a natural language parsing algorithm, which applies a set of natural language rules to identify the syntactic and semantic structure of the identified phrase.
The semantic structure is then passed to a plan based system, that applies a different set of rules based on the semantic meaning and the past dialog statements made by the user and the computer. Based on the execution of these rules, the dialog system selects an action that is to be taken.
Some systems have attempted to use stochastic models in the conversion from what was said to the semantic meaning of what was said. For example, in “The Thoughtful Elephant: Strategies for Spoken Dialog Systems” E. Souvignier et al., IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 1 (January 2000), a stochastic model is applied to both the step of identifying of what has been said and the step of converting what has been said into a semantic meaning.
Other systems have used stochastic models to determine what action to take given a semantic meaning. For example, in “A Stochastic Model for Machine Interaction for Learning Dialog Strategies”, Levin et al., IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 1, pg. 11-23 (January 2000), a stochastic model is used in the conversion from a semantic meaning to an action.
Although stochastic models have been used in each of the stages separately, no system has been provided to use stochastic models in all of the stages of a dialog system that are designed to optimize the same objective function. Because of this, the sub-systems in these dialog systems do not integrate naturally with each other.
Another problem with current dialog systems is that they are not well suited for distributed computing environments with less than perfect quality of service. Telephone based dialog systems, for example, rely heavily on the telephone links. A severance in the phone connection generally leads to the loss of dialog context and interaction contents. As a result, the dialog technologies developed for phone based system cannot be applied directly to Internet environments where the interlocutors do not always maintain a sustained connection. In addition, existing dialog systems typically force the user into a fixed interface on a single device that limits the way in which the user may drive the dialog. For example, current dialog systems typically require the user to use an Internet browser or a telephone, and do not allow a user to switch dynamically to a phone interface or a hand-held interface, or vice versa, in the middle of the interaction. As such, these systems do not provide as much user control as would be desired.