Artificial neural networks are mathematical models inspired by biological neural networks. They are used to approximate functions that can depend on a large number of unknown inputs. Neural networks are generally presented as systems of interconnected “neurons” which exchange messages between each other. The connections may have numeric weights that can be tuned using various optimization methods, for example stochastic gradient descent.
A deep neural network is made up of many layers. A layer, for example, may have n inputs (x1, x2, . . . , xn) and m outputs (y1, y2, . . . , ym). The number of inputs may be different from the number of outputs, and may also be different for different layers. Each layer maps the inputs to the outputs, in a way that is specific to the type of layer. The outputs from one layer may be the inputs to the next layer.
One type of layer found in neural networks is a fully connected layer. It connects every input to every output, such that yi=wi,1*x1+wi,2*x2+ . . . +wi,n *xn. This may also be represented using matrices as y=W.x, where W is an m x n matrix. When implementing the neural network on a computer, n x m parameters are loaded from memory and n x m computations are performed. Some of the larger layers of neural networks have up to n=9216 and m=4096. With 32-bit weights, this requires 150 MB for each iteration. Memory bandwidth is expensive in embedded device implementations.
Therefore, there is a need for improvement.