This specification relates to policy evaluation for a reinforcement learning agent.
Reinforcement learning agents interact with an environment by receiving an observation that characterizes the current state of the environment, and in response, performing an action from a pre-determined set of actions. Reinforcement learning agents generally receive rewards in response to performing the actions and select the action to be performed in response to receiving a given observation in accordance with a policy that includes rules for selecting actions in response to received observations.