A neural network can be modeled as collections of neurons that are connected in an acyclic graph. A neural network can receive an input (a single vector), and transform it through a series of hidden layers. Each hidden layer is made up of a set of neurons, where each neuron is fully connected to all neurons in the previous layer, and where neurons in a single layer function completely independently and do not share any connections. The last fully-connected layer is called the “output layer” and in classification settings it represents the class scores. A convolutional neural network (CNN) is similar to a standard neural network. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. A CNN, however, is explicitly tailored to handle input image data. Based on raw image data input, the network outputs classification score for the input data.
CNN topologies handle a large amount of data. Ideally, this data is processed within on-chip memory. Weight matrix kernel data can generally consume in the order of a few hundred megabytes of memory. Additionally, each layer of the CNN can produce a large amount of data in the form of output feature maps. During operation of a computing system implementing a CNN, kernel data can be read from system main memory. Output feature maps (OFMs) for a CNN layer can also be stored in main memory, where those maps are read as input feature maps (IFMs) for a next layer. Because of the large amount of on-chip data that is processed computing systems executing a CNN, a large amount of power can be expended reading and writing the CNN data.