This specification relates to control policies for robots.
When being controlled by a robot control system, a robot interacts with an environment by performing actions that are selected by the robot control system in response to receiving state representations that characterize the current state of the environment.
Some robot control systems select the action to be performed by the robot in response to receiving a given state representation in accordance with an output of a neural network.
Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks are deep neural networks that include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.