1. Technical Field
The present disclosure relates to automatic dialog systems and more specifically to a dialog manager that learns automatically.
2. Introduction
Spoken dialog systems help people do something by interacting with them using spoken language. At the core of a spoken dialog system is a dialog manager, which controls the flow of the conversation and decides what to say or do given the current dialog state. In industry, the dialog manager is typically crafted by hand. This is a time-consuming, tedious task in which a human designer must try to divine all the courses a conversation might take. This task is difficult both because people often behave unexpectedly and because speech recognition errors may occur at any time. A manual design process inevitably leads to sub-optimal dialog managers because a human designer cannot feasibly consider all or even most of the conversational paths. While designers have used this approach to build numerous deployed dialog systems, this approach can ignore potentially useful distinctions between dialog states and lead to sub-optimal dialog systems. The consequence for users is longer interactions and more failed dialogs.
A further complication is that once a dialog manager has been deployed, its design is fixed. It does not learn from experience. Only careful monitoring of the system in deployment can catch flaws in the original design. Fixing these flaws requires a long, labor intensive feedback cycle of re-design, re-testing, re-deployment, and more monitoring.
One approach to resolve these problems is to apply reinforcement learning (RL) techniques to automatically assign actions to dialog states. If certain technical assumptions hold, an appropriate RL algorithm can even efficiently and accurately learn an optimal dialog manager. In practice, limits of computational complexity and the size of available dialog corpora typically bound the number of independent states that RL can consider. One solution to this approach is a feature-based representation of the dialog state. The reasoning behind a feature-based approach is that features will enable the dialog manager to generalize even when the number of dialog states is massive.
Learning tractability now depends not on the number of possible dialog states, but on choosing a compact set of useful features about these states. Whereas it is often easy for a designer to suggest a large set of potentially useful features, it is difficult for a designer to ascertain which subsets are actually useful for an RL algorithm. Using too few features ignores useful information which can improve dialog managers. On the other hand, using too many features complicates the learning task and makes learning within the limits of available data and computation time challenging.
In practice a designer can usually suggest many more features than are actually useful. In other words, although some features are useful for learning a dialog manager and others are simply noise, complex interdependencies among the features in learnt policies make it difficult for a person to predict in advance which are useful. Including the noise features can slow or hinder the learning process.
In addition, a learning algorithm should be able to work in both an off-line and on-line setting. In an off-line setting, the learner is given a fixed corpus of interactions, for example logs from an iPhone application. In an on-line setting, the learner is already controlling the dialog manager and its task is to make further improvements on the fly. The needs of off-line and on-line learning are very different, and to date no algorithms have been applied to dialog management which accomplish both of these tasks well.