1. Field of the Invention
The present invention relates to speech technology and more specifically to a system and method of developing a general dialog principle from conception to implementation as part of a dialog manager library of dialog motivators.
2. Discussion of Related Art
In the process of carrying on an intelligent conversation between a human user and a computer, the computer must perform numerous complicated processes. Those of skill in the art understand the basic modules necessary for receiving voice signals from the user, processing those signals and formulating a response from the computer. In a typical dialog system, an automatic speech recognition (ASR) module interprets the text of the user speech. A spoken language understanding (SLU) module receives the ASR text and seeks to determine or understand the meaning of the text. A dialogue manager (DM) receives the meaning of the user speech and formulates an appropriate response. The text comprising the computer response is converted to audible and synthetic speech sounds via a text-to-speech (TTS) module.
This disclosure relates to technologies associated with the DM. Many spoken dialog systems differ in dialog management strategies in the way they represent and manipulate task knowledge and how much initiative they take in management of the user-computer spoken dialogue. For example, M. McTear discusses dialog management technology in M. McTear, “Spoken Dialogue Technology: Enabling the Conversational user Interface”, ACM Computing Surveys, 2001, incorporated herein by reference.
In some systems, dialog grammars are used. Dialog grammars are constrained and well-understood formalisms, such as finite-state machines, to express sequencing regularities in dialogs. As with most grammar systems, dialog-act types such as explain, complain, request, etc. are categories and the categories are used as terminals to the dialog grammar. Using dialog grammars enables the system at each stage of the spoken dialog to have a basis for setting expectations, which may correspond to activating statement-dependent language models. Further, using dialog grammars provides for setting thresholds for rejection and requests for clarification.
FIG. 1 illustrates a finite-state dialog grammar for an airline reservation system. In this example, the interactions are controlled based on bare information items. See Heeman, P. A., et al., “Beyond Structured Dialogues: Factoring Out Grounding,” Proc. Of the Int. Conf. on Spoken Language Processing, 1998, Sydney, Australia. This interaction is a basic question and answer form and the topic queries are answered on-topic if possible, with the confirmation statement to find any problems. As shown in FIG. 1, the system asks “where do you want to leave from?” (10). After the user response, the system confirms by asking “Did you say <FROM>?” (12). If the answer is “no” from the user, the system returns and repeats the question 10. If the system's interpretation is correct, the system next asks “where do you want to go to?” (14). After the user's response, the system confirms by asking “did you say <TO>?” (16). If the system was incorrect, the question (14) is asked again. If correct, the system proceeds to ask “when did you want to leave?” (18). After the user responds, the system confirms by asking “did you say <TIME>?” (20). If the system was incorrect, it asks the question 18 again. If correct, the system proceeds to ask “is it a one-way trip?” and so on.
The above-mentioned finite-state dialog manager provides some advantages in spoken dialog systems. Such a system is easily programmable but also increases the challenges of dealing with user deviations from the scripted dialog. For example, if a user provides too much information after the first question, such as name, time they want to leave, and come home, and where they want to go, the dialog management grammar in FIG. 1 cannot handle the information. In general, the dialog grammar approach has many disadvantages, such as: scripted and inflexible interaction as experienced by the user; difficulty with non-standard language such as irony; speech information may be provided by several utterances that can confuse the grammar; and as mentioned above, a speech utterance may include several pieces of information, which complicates the grammar.
Some more sophisticated approaches are being implemented to address the deficiencies of the dialog grammars. For example, enhancements to the hand-built finite-state dialog grammars include adding statistical knowledge based on realistic data to the dialog grammars. Statistical learning methods—like CART, n-grams or neural networks—can improve the understanding and associations between utterances and states the training data. See Andernach, Tl, M. Poel, and E. Salomons, “Finding Classes of Dialogue Utterances with Kohonen Networks,” Proc. of the NLP Workshop fo the European Conf. on Machine learning (ECML), 1997, Prague, Czech Republic. Finite-state-based dialog managers lack the necessary scalability and maintainability demanded by customers today.
Another approach to dialog management is the plan-based approach. This concept seeks to overcome the weaknesses of the dialog grammar approach by taking advantage of the observation that humans plan their actions to achieve goals. The correspondence between plans and goals drives assumptions to infer goals and construct and activate plans. Therefore, the underlying concept for plan-based dialog managers is intelligent inference using the behavior of the user and the knowledge of the domain that are programmed into a set of logical rules. The system gathers facts from the user that trigger rules that generate more facts, and the human-computer interaction progresses.
In terms of scalability, the plan-based approach is one embodiment of a state machine for which different discourse semantics are regarded as states. In plan-based systems, however, the states are generated dynamically and not limited to a predetermined finite set. This capability provides an improved level of scalability.
FIG. 2 illustrates a partial plan for an airline reservation system represented as a graph. See, Cohen, P. “Models of Dialogue,” Proc. of the Fourth NEC Research Symposium, 1994, SIAM Press. The goal of this dialog manager is to derive an action based on a discourse semantic Sn. The output of the dialog manager is a message the system provides to the user. In FIG. 2, the person desires to know if the flight itinerary F12 is an available flight plan. The relationships among the goals and actions that compose the plan are represented as a directed graph, with goals, preconditions, actions and effects as nodes and relationships among these as arcs. FIG. 2 illustrates the compositional nature of the plan-based approach, which always includes nested subplans that can continue to an almost infinite sublevel.
The arcs in FIG. 2 are labeled with the relationship that holds between the two nodes. The “SUB” shows that the child arc is the beginning of a subplan for the parent. At some point appropriate to the domain of the planning application, the SUBs are suspended and represented as a single subsuming node. The term “ENABLE” indicates a precondition on a goal of action or indicates an enabling relationship between parent and child nodes. The “EFFECT” label indicates the result of an action.
The plan-based approach operates on a well-defined cycle, as illustrated below in a set of actions describing interaction between an agent and a client: (1) observe client's acts; (2) infer client's plan using the agent's model of the client's beliefs and goals; (3) debug the client's plan, finding obstacles to the success of the plan, based on the agent's beliefs; (4) adopt the negation of the obstacles as the agent's goal; and (5) plan to achieve those goals and execute the plan.
Returning to FIG. 2, a flight itinerary that at least contains an Outbound_Leg and another possible Inbound_Leg subgoal is a round trip. Assuming that F12 is a round trip itinerary, at the Inbound_Leg node, the system attempts to infer the underlying goal (Time (F12, T2), Origin (F12,C3) and Dest (F12,C4)) by the information received from the dialog or from other known conditions. For example, the destination of the Inbound_Leg may be inferred from the origin of the outbound leg. Inferences are shown in the EFFECT arcs in FIG. 2.
The technologies required to accomplish these inferences are complex models of beliefs, desires, and intentions of agent, and they use generic logical systems which operate over the propositions corresponding to the nodes of a plan structure as shown in FIG. 2.
These plan-based approaches permit a more flexible mode of interaction than do dialog grammars but they are nevertheless complex to construct and operate in practice. Therefore, since the complexity of modeling plan-based approaches requires significant human expert time to author the logical rules and axioms, this approach prevents many enterprises from being able to afford and incorporate spoken dialog systems into their business.