This invention relates to an interaction assistance system, and in particular to an automated assistant for a user interacting with a system using speech.
Previous automated dialog systems have been based on hand-constructed slot-filling applications. These are normally hand-tuned, and accept only a subset of the English language as input (this tends to make them difficult to use, and very hard to learn). Some such systems support mixed initiative, a mode in which machines collect additional information about the conversation from the user. More recently, Partially-Observable Markov Decision Process (POMDP) approaches have used partially hidden Markov processes to keep track of the state of the system, where the system keeps track of multiple states at each time, and the system acts on a best guess at each time. In such prior systems, the semantics of the processes have been hand coded, or encoded as a simple probabilistic process if the dialog is simple enough. Semantics are tied to meanings or actions of words and/or context.
In the area of telephone-based assistants, previous telephone assistants were not in general dialog agents, but were instead single utterance command/response systems. In a number of systems, the user can request either a piece of information or an action, and the system responds appropriately if the speech recognizer had been accurate and if the user had uttered a request from within the vocabulary of the system. However, in general, the systems were brittle, did not understand paraphrase, and did not carry context across sessions, and mostly did not carry context even within an interaction session.
In one aspect, in general, an interaction assistant conducts multiple turn interaction dialogs with a user in which context is maintained between turns, and the system manages the dialog to achieve an inferred goal for the user. The system includes an integration section that includes a first integration component for providing a linguistic interface to a user. The system also includes an event processing section including a parser for processing linguistic events from the first integration component. A dialog manager of the system is configured to receive alternative outputs from the event processing section, and selecting an action and causing the action to be performed based on the received alternative outputs. The system further includes a storage for a dialog state for an interaction with the user, and wherein the alternative outputs from the event processing section represent alternative transitions from a current dialog state to a next dialog state. The system further includes a storage for a plurality of templates, and wherein each dialog state is defined in terms of an interrelationship of one or more instances of the templates.
In another aspect, in general, a method is used for determining parameter values for a plurality of components of an interaction system. The system is configured to process sequences of events, the events including linguistic events and application related events, the processing of events including parsing of linguistic events, determining a sequence of dialog states, and determining a sequence of output actions from sequence of events corresponding to the sequence of dialog states. The method includes collecting a plurality of sequences of events and corresponding sequence of output actions. An iteration is repeated. Each iteration includes processing a sequence of events and a corresponding sequence of output actions by processing the sequence of events using current parameter values of the system, the processing including determining a sequence of dialog states from the sequence of events. A sequence of output actions is determined from the sequence of dialog states. A comparison of the determined sequence of output actions and the collected sequence of output actions is used to update parameter values of the plurality of components of the system. The repeating of the iterations is completed upon reaching of an ending conduction. The parameter values for the plurality of components of the system are set using a result of the iterations.
An advantage of one or more embodiments is that use of templates from which the dialog states are defined permits use of a large set of possible dialog states without requiring explicit specification of those states. Furthermore, the structure of the system enables efficient and effective determination of parameter values (“training”) of machine learning and neural network components.
Other features and advantages of the invention are apparent from the following description, and from the claims.