Influence diagrams can be employed to facilitate decision making. These models can be constructed by an expert (typically with the aid of a decision analyst) and/or learned from data. Generally, supervised learning techniques for “ordinary” Bayesian networks apply easily (e.g., with little modification) to learn the structure and parameters of an influence diagram. If an influence diagram is going to be used repeatedly to make decisions, it is desirable to use resulting observed data to improve the model over time.
Reinforcement learning deals with learning how to act in an environment. One of the central problems in reinforcement learning is deciding when to explore and when to exploit. In particular, given a current state of an environment and given a model about the expected (short-term) reward for performing each action in that state, the system can “exploit” by performing the action that has the highest expected reward. On the other hand, because the model may be uncertain about the environment, the system can instead choose to perform a sub-optimal short-term action in order to gain information and improve the model for the long term (“explore”).