1. Field of the Invention
The present invention relates to speech recognition and synthesis systems and, more particularly, to a natural language task-oriented dialog manager and method for providing a more versatile interface for interacting with users.
2. Description of the Related Art
Conversational systems are generally task-oriented. This means that their role is to help users achieve a specific goal in a particular application domain. A weather inquiry conversational system will provide users information about weather reports and forecasts for a specific geographic location, but will not be able to conduct philosophical debates with the user. Moreover, a weather inquiry system is not expected to understand user's queries about stock prices, let alone answer them. That is, these systems are domain-specific.
Even as these systems are task-oriented and domain-specific, they can be quite flexible within their domains. They are expected to handle queries in the domain expressed freely in natural language. The input and output could be either text-oriented or speech-oriented. Speech-oriented systems have a speech recognition subsystem (speech-to-text system) and a speech synthesis subsystem (text-to-speech system).
Both mixed-initiative and machine-initiative approaches to task-oriented dialog management can be found in the prior art. There are two principal ways of implementing dialog managers.
One way is to define a finite number of states in which dialog can be, and describe what actions by the user or the computer at a given state will change the dialog state to another. All the actions for state transitions are predefined for every pair of states. The designer of the application decides what these states are, and what user actions cause the transitions and the computer response in its turn upon these state transitions. The dialog manager's behavior is completely specified by a state table. Developing applications using this strategy is very laborious, complicated, and may be untenable for all but the simplest applications. Since the dialog manager is virtually the same as the state table, the dialog manager itself is not portable across applications. Such state-table based dialog managers typically are machine-initiative dialog managers, directing the dialog.
An example of a state-based dialog system is described in U.S. Pat. No. 5,577,165, "Speech Dialogue System for Facilitating Improved Human-Computer Interaction", to Takebayashi et. al., issued Nov. 19, 1996.
There are recent improvements in the state-based dialog management. The publication "AMICA: the AT&T Mixed Initiative Conversational Architecture" by R. Pieraccini, E. Levine, and W. Eckert in the Proceedings of Eurospeech-97, Rhodes, Greece, 1997, vol 4, pp. 1875-1878 (Pieraccini et al.), describes a dialog system architecture that is based on a dialog state space, an action set, and a dialog strategy. The dialog state corresponds to all the information available at a certain time during the course of dialogue. The action set is the set of all actions that the dialog system can perform (such as asking the user for input, providing the user some output, and performing data retrieval from a database). The dialog strategy in Pieraccini et al. specifies the action to be performed next for each state reached. The implementation of the strategy is represented by a recursive transition network whose arcs represent conditions on the state, and whose nodes represent the actions. The dialogue system operates in the following manner: based on the current state, identify a particular node in the network, invoke the action associated with the node (the action updates the dialog state), move to the next state depending on the state resulting from applying the action. The design of the dialog strategy can pose an optimization problem which can be solved by a reinforcement learning approach as described in "Using Markov Decision Process for Learning Dialogue Strategies", by E. Levin, R. Pieraccini, and W. Eckert, in the Proceedings of the International Conference on Acoustics, Speech, Signal Processing, Seattle, May 1998, vol 1, pp. 201-204.
Another approach to dialog management is based on "frames". Frames are the electronic equivalent of forms that are filled in any bureaucratic process. These "electronic forms" ("forms" hereinafter) have slots that correspond to information supplied by the user. For example, an airline travel form will have a slot for departure date, a slot for departure location and a slot for arrival location. There may be additional slots for optional information like airline name. When the user provides information, it is used to fill the slots in the form. The information corresponding to the slots can be provided by the user in any order. If any vital slot is unfilled, the machine will ask the user for the value of that field (slot). In this way, mixed-initiative is achieved. A first example of such a system is "GUS: A Frame-driven Dialog Manager" by Bobrow et al, published in Artificial Intelligence, vol 8 (1977), pp. 155-173. This work describes dialog management of just one task; booking an airline ticket. This means there is only one active form.
Another recent example is "A Form-based Dialog Manager for Spoken Language Applications" by D. Goddeau, H. Meng, J. Polifroni, S. Seneff, and S. Busayapongchai, in Proceedings of the International Conference on Spoken Language Processing, Philadelphia, 1996, pp. 701-704. This work describes dialog management in the domain of used car price quotes. Again, there is only one task and one active form corresponding to that task.
Yet another approach to dialog processing is an information-based approach of "Dialogue Strategies Guiding Users to their Communicative Goals", by Matthias Denecke and Alex Waibel, published in Proceedings of Eurospeech-97, Rhodes, Greece, 1998, pp. 1339-1342. By information-based approach, it is meant that the specificity of the information comprising results from database retrieval determines the actions to be undertaken by the dialogue system. They represent each of users' communicative goals by a typed feature structure (a domain-specific object) in which the feature values impose lower bounds on the data fields required for the goal. The main goal of this approach is to generate clarification dialogues by determining which questions to ask the user in case the user does not specify all the necessary information for a goal. The sequence of questions asked are expected to elicit answers from users to fill initially deficient feature structure step by step, thus generating a feature structure that meets the information lower bound of a communicative goal.
Poor recognition, silence, requests for help, cancellation, list-navigation, and requests for repetition of last response are some issues of dialog management that are common for all applications and domains. Processing these events or requests are referred to as domain-independent processing.
Therefore, a need exists for a dialog manager which is more versatile in interacting with a user. A further need exists for a dialog management system which responds to information on a wide range of topics in natural language and is easily adaptable to new tasks. A still further need exists for a method of interacting with a single user on a plurality of topics.