Data in neural networks are viewed as tensors and stored as multiple-dimensional arrays. For instance, vectors are rank-1 tensors and matrices are rank-2 tensors. A 2D image with three color channels (R, G, and B) is a rank-3 tensor. 3D medical images collected over time can be organized as a rank-4 tensor.
A neural network can be represented as a computation graph in which each node in the graph is a computation layer. A data tensor memory can be disposed between two layers so that one layer produces a data tensor for the next layer to consume.
AlexNet and VGGNet are examples of neural networks implemented as a series of layers. The output of one layer depends solely on the output of the preceding layer, with the exception of the input layer, which does not receive input from another layer. Recent convolutional neural networks with higher accuracy have a more general neural-net topology. Rather than being a series of layers, the layers in these networks are the nodes of a two-terminal series-parallel digraph, which may also be referred to a “series-parallel graph” or “sp-graph.” GoogLeNet and ResNet are examples of neural networks that exhibit a series-parallel graph topology.
A spectrum of hardware architectures can process these layers. At one end of the spectrum, a layer module or simply a “module” is implemented to compute the output of each layer. At the other end of the spectrum, a one-size-fits all module processes the layers iteratively. In between these two extremes, the layers can be partitioned across a network of modules such that each module computes the output of one or more layers, but no module computes for all layers. Through a data tensor memory, one module sends data to the next. A module that processes multiple layers also feeds output data from one layer back to itself for iterative layer processing. The design of this memory is the subject of this invention.
Because of the recent success of convolutional neural networks applied to image classification, many implementations of the data tensor memory are image-centric. The two-dimensional (2D) image from each channel is spatially distributed to a 2D array of arithmetic units for parallel processing. A drawback to this approach is that when the image dimensions change, the arithmetic array needs to change to keep the efficiency high, and the data tensor memory must be re-designed accordingly. Furthermore, if the arithmetic array cannot be re-dimensioned, efficiency drops.