1. Field of the Invention
The present invention relates to simulations of user interaction with spoken dialog systems, and more specifically to evaluating simulations of users using a divergence metric.
2. Introduction
Traditionally, spoken dialog systems have been hand-built by researchers, which is problematic because a human designer needs to consider innumerable dialog situations, many of which can be difficult to foresee. To address this, researchers have begun incorporating machine learning techniques into spoken dialog systems. The idea is for a (human) designer to provide the high-level objectives, and for the machine learning algorithm to determine what to do in each dialog situation.
Machine learning algorithms for dialogs usually operate by exploring different dialog strategies and making incremental improvements. This process, called training, often requires thousands or millions of dialogs to complete, which is clearly infeasible with real users. As a result, machine learning algorithms are usually trained with a user simulation, which is a computer program or model that is intended to be a realistic substitute for a population of real users.
Ultimately, the success of a machine learning approach depends on the quality of the user simulation used to train it. Yet, despite this, there is no accepted method to evaluate user simulations. This is especially problematic because machine learning-based dialog systems are often trained and evaluated on user simulations alone, not on real users. Without some quantification of user simulation reliability, it is hard to judge claims about machine learning approaches not evaluated on real users. Accordingly, what is needed is an improved method of using user simulation in dialog systems