This specification relates to computing neural network inferences in hardware.
Neural networks are machine learning models that employ one or more layers of neurons to generate an output, e.g., a classification, for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer of the network. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.
Traditionally, some neural network systems compute inferences serially. That is, when computing inferences for multiple inputs, the neural network system can process each input through each of the layers of the neural network to generate the output for the input before processing the next input.