This specification relates to processing computational graphs representing neural networks using an accelerator device, e.g., a graphical processing unit (GPU).
Neural networks are machine learning models that employ one or more layers of models to generate an output, e.g., one or more classifications, for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer of the network. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters for the layer.
In systems that exist, the operations of computational graphs can be processed by an individual device. In some implementations, the device is a GPU. The device can have a processor that performs operations, e.g., generating outputs at a layer from inputs, and stores outputs from the operations in memory. Due to the large number and size of operations generally required to generate the outputs in the computational graph, one device can take a significant amount of time to process the operations of the graph.