Neural networks are used to approximate functions that can depend on a large number of unknown inputs. Neural networks are generally represented as systems of interconnected neurons (also referred to as nodes), which exchange messages with each other. The connections between the nodes of a neural network are assigned numeric weights, each of which characterizes the manner in which an input to a given node is related to an output to the node. Each weight multiplies (and accordingly modifies) an input to a given node to generate an output. The weights can be tuned using various optimization methods, such as stochastic gradient descent in order to change the response of the neural network to a particular input.
As neural networks become more complex, they can be arranged to have multiple layers of connected nodes. These multiply-layered neural networks are often referred to as deep neural networks. Deep neural networks are often models that can learn complex relationships between their inputs (also referred to as input nodes) and their outputs (also referred to as output nodes). A layer may, for example, have n input nodes (x1, x2, . . . , xn) and m output nodes (y1, y2, . . . , ym). The number of input nodes may be different from the number of output nodes (e.g. n does not always equal m), and the number of input nodes of a given layer may also be different from the number of input nodes of another layer. Each layer maps the input nodes to the output nodes, in a way that is specific to the type of layer. The outputs from one layer are the inputs to the next layer.
One type of layer found in neural networks is a fully connected layer, in which every input node is connected to every output node, such that the output of a given node i can be represented as yi=wi,1*x1+wi,2*x2+ . . . +wi,n*xn, where wz represents the weight applied to the input z. This may also be represented using matrices as y=W·x, where x is an n-dimensional input vector, y is an m-dimensional output vector, W is an m×n matrix of connection parameters (also referred to as weights), and · represents a dot product. When implementing the neural network on a computer, n×m connection parameters are loaded from memory and n×m computations are performed. Some of the larger layers of publicly tested and demonstrated neural networks have up to n=9216 and m=4096, with 32-bit values for each weight. Layers of this size can involve 150 MB of data to be processed in each iteration. This can become problematic in memory-constrained or low-power devices.
A number of solutions have been proposed to reduce the number of connection parameters in neural networks. However, existing solutions are either manual or require significant additional training time typically measured in tens or hundreds of clock hours. There is therefore a need for an improved system and method for training a neural network.