1. Field of the Invention
The invention relates to processing devices, and in particular, processing devices having policies that govern interactions with users.
2. Introduction
Interaction policies may be implemented for processing devices, such that interactive behavior of the processing devices may be governed by the implemented interaction policies. Ideally, each of the processing devices would include an implemented interaction policy for each individual user. However, implementation of such interaction policies would be cost prohibitive.
Two widely used methods for implementing interaction policies are Bayesian networks (either static or dynamic) or Markov decision processes (either fully observable or partially observable). Bayesian networks provide for customization of an interaction policy by permitting variables, which represent user preferences, to be defined. The Bayesian networks determine an action to perform based on inference, or drawing a conclusion from what is already known. Inference is computationally costly and is performed at each dialog step. Reinforcement learning is an approach for learning customized interaction policies or Markov decision processes. Reinforcement learning algorithms attempt to learn a policy which maximizes a reward over the course of a problem. One problem with reinforcement learning for managing interactions is that reward measures are subjective and are rarely explicitly provided by a user. That is, the user will rarely rate an interaction after it is completed or provide feedback to a device with respect to how well the device handled the initial goal of the user. Another problem is the computational expense with respect to computing optimal, or even sub-optimal interaction policies.