Reinforcement Learning is goal-directed machine learning. An agent learns from direct interaction with its environment, without relying on explicit supervision and complete models of the environment. Reinforcement learning is a formal framework modeling the interaction between a learning agent and its environment in terms of states, actions and rewards. At each time step, an agent receives a state, selects an action following a policy, receives a scalar reward, and transitions to the next state. The agent's goal is to maximize an expected cumulative reward, e.g. the sum-total of each scalar reward received based on an action. The agent may receive a positive scalar reward for a positive action and a negative scalar reward for a negative action. Thus, the agent ‘learns’ by attempting to maximize the expected cumulative reward.