This invention relates to a method and an apparatus for automatically performing desired actions in response to spoken requests. It is particularly applicable to a natural-dialogue speech application system using a discourse management unit, as may be used to partially or fully automate reservation applications, telephone directory assistance functions, allocation applications, voice activated dialing (VAD), credit card number identification, appointment schedulers, automated rental car bookers and other speech recognition enabled services.
Speech recognition enabled services are more and more popular today. The services may include stock quotes, directory assistance, reservations and many others. In most of these applications, when the information requested can be expressed as an alphanumeric or sequence of alphanumeric characters, the user is required to enter his request via a touch tone telephone. This is often aggravating for the user since he is usually obliged to make repetitive entries in order to perform a single function. This situation becomes even more difficult when the input information is a word or phrase. In these situations, the involvement of a human operator may be required to complete the desired task. To overcome these drawbacks the industry has devoted considerable efforts to the development of systems that reduce the labor costs associated with providing information services. These efforts comprise the development of sophisticated speech processing and recognition systems including natural language understanding and discourse management units.
In typical speech applications requiring a complex interaction with the user, the discourse manager (DM) unit keeps track of the context of the conversation. Keeping track of the context of the conversation generally entails remembering the relevant information said by the user to the system, and responding appropriately given that information. The process of keeping track of the context of a conversation is also known as discourse management.
A common method for discourse management is the use of an electronic form approach. Such a method is described in xe2x80x9cA Form-based Dialogue Manager for Spoken Language Applicationsxe2x80x9d, Goddeau et al., Proc. ICSLP 1996, p.701-704 whose content is hereby incorporated by reference. In such systems, user interaction with the system is centered around filling the slots in the electronic form, while the system performs appropriate database query operations and summarizes the results it obtained.
In another commonly used discourse management process, the system keeps track of the context of the conversation by using an exhaustive dialogue state machine. The state machine explicitly models all the interactions that the user can have with the system where each possible reply to any system prompt leads to a new state in the finite state machine (FSM). The discourse manager interprets the input of the user on the basis of the state that it is currently in. Exhaustive dialog systems are well-known in the art to which this invention pertains. The reader is invited to refer to xe2x80x9cManaging Dialog in a Continuous Speech Understanding Systemxe2x80x9d, E. Gerbino et al., Proc. Eurospeech 1993, p.1661-1664 whose content is hereby incorporated by reference.
Applications using discourse managers of the type described above generally consist of system-directed dialogues where the system initiates any conversation segment. Mixed-initiative systems that rely on this technology permit only extremely limited xe2x80x9cinitiativexe2x80x9d on the part of the user. In addition discourse managers using a finite state machine of the type described above are generally not suitable for complex applications. In complex applications, such methods often lead to a very large increase in the number of states in the finite state machine as the application becomes more complex which renders the finite state machine unmanageable.
Thus, there exists a need in the industry to refine the process of discourse management so as to obtain an improved natural-dialogue speech application system.
The present invention is directed to a method and an apparatus for performing discourse management. In particular the invention provides a discourse management apparatus for assisting a user to achieve a certain task. The discourse management apparatus receives information data elements from the user, such as spoken utterances or typed text, and processes them by implementing a finite state machine. The finite state machine evolves according to the context of the information provided by the user in order to reach a certain state where a signal can be output having a practical utility in achieving the task desired by the user. The context based approach allows the discourse management apparatus to keep track of the conversation state without the undue complexity of prior art discourse management systems.
In one possible application of the present invention the discourse management apparatus is part of a natural-dialogue speech application system. In such a system there is generally more than one possible interpretation to any single user input. The discourse management (DM) unit of the dialog management apparatus performs the understanding of the input request in the context of a certain conversation. For each input utterance by a user, a set of operations are performed by the discourse management unit to derive from the logical form input received from a natural language understanding (NLU) unit the response to be outputted back to the user. The discourse management unit makes use of an expectation handling unit and a conversation analyzer to provide the context dependent interpretation capability. The expectation handling unit maps -the input data into data that is context dependent on the basis of dynamically generated remapping rules. The conversation analyzer receives the context-dependent data from the expectation handling unit and incorporates it into the state of the conversation. More precisely, the conversation analyzer keeps track of how the new context-dependent data should affect the system and the knowledge the system has of the user""s goals.
One possible way to achieve the context-dependent interpretation of the input is to use two sets of rules namely context-dependent remapping rules, and context-dependent state-transition rules. Context-dependent remapping rules operate by matching specific patterns believed to be likely in the user logical form, and transforming them into a new form that makes explicit all the meaning implicitly present in the user""s response. The exact transformation to be applied is determined by the prompts that were previously said by the system to the user. Context-dependent remapping rules are used by the expectation handling unit in the discourse management unit to map a user response into its meaning on the basis of the context of the conversation. Context-dependent state transition rules define new transitions in the finite state machine that are temporarily added to the finite state machine for the purpose of interpreting a specific user response. In one possible form of implementation, context dependent remapping rules and context-dependent state transition rules are allowed to be dynamically created, updated, and negated as the conversation progresses. They are not reused in subsequent turns rather they are replaced by new rules that are created as the conversation progresses. Advantageously, negating the context dependent rules allows making the finite state machine more manageable and provides increased flexibility in the dialog and in the natural-dialogue speech application system as a whole.
The invention also provides a novel method for performing discourse management.
For the purpose of this specification, the expression xe2x80x9cin focusxe2x80x9d is used to designate the active state of a finite state machine.
For the purpose of this specification, the expression xe2x80x9cpermanent transitionxe2x80x9d is used to designate a transition in a finite state machine that is independent of the context of a conversation.
For the purpose of this specification, the expression xe2x80x9ctemporary transitionxe2x80x9d is used to designate a transition in a finite state machine that is dependent of the context of a conversation. In a specific example, temporary transitions may be created dynamically in a finite state machine and may be destroyed when they are no longer required.
For the purpose of this specification, the expression xe2x80x9cwildcard transitionxe2x80x9d is used to designate a transition in a finite state machine from a set of states to another state.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.