Field of the Invention
A “sequential decision problem” is a problem that involves multiple periods of time, where the subject of the problem has more than one possible course of action, receives information sequentially over time, and must make a decision at each point in time.
In general, sequential decision problems contemplate a subject capable of making a sequence of decisions (each causing some “action” to take place) as the subject faces a series of possible conditions (or “states”). The actions of the subject, together with any random (stochastic or uncertain) events, result in the subject of the problem receiving a reward (or penalty) in the current time period, and then transitioning to another state. That subsequent state is the setting for the same or a similar problem in the following time period.
Sequential decision problems represent a large number of decision problems faced by individuals, families, investors, business managers, governments, and other organizations. In many such problems there is difficulty in both defining and calibrating the inputs to, and solving the decision problem.
Description of Related Art
When analyzing these problems the term “calibrate” means to use statistical methods to estimate parameters. Calibration is thus the use of statistical methods to estimate parameters. Furthermore, with regards to a sequential decision problem, the term “define” means using statistical techniques to determine the state space, action space, or time index of a sequential decision problem. Definition is thus the use of statistical techniques to determine the state space, action space, or time index.
Current systems and methods of calibrating or defining the inputs to the decision problem rely heavily on the user's ability to apply the user's own experience to define the inputs in a “heuristic” fashion. The user must define the possible actions, states (with rewards corresponding to each state-action pair), and the transition probabilities between states when the actions are taken. However, this approach may fail in many situations, such as:
Systems where the number of states or actions becomes large
The state space or action space is not immediately apparent, or the user wishes to use a rigorous method to choose between possible state or action spaces
The user wishes to inform the choice of state space, action space, transition matrix, or reward matrix from data describing the behavior of the system
Cases where the user wishes to avoid bias when determining the inputs.
When any of the above cases occur, constructing the state and action spaces and or the reward and transition matrices becomes a tedious and difficult task.