The present invention relates to neural networks and methods for training neural networks. More particularly, embodiments of the present invention relate to training neural networks using differences in input values and differences in output values in addition to the values themselves. Further embodiments of the present invention relate to accelerated methods for training neural networks, including networks trained using differences in input values.
A neural network is a distributed information processing structure made up of computational "nodes" interconnected by unidirectional "connections". The interconnections within a neural network define the network's "topology." In a neural network, each node has one or more inputs and only one output. (Sometimes a single node is said to have more than one output, but such a single node is conceptually equivalent to multiple nodes each with one output and each with inputs from among the single node's inputs.) An output of one node may form the input of another node. As a whole, a neural network has one or more "network inputs," also called "primary inputs," and one or more "network outputs." Nodes that accept network inputs are called the "input layer" of nodes. Similarly, nodes that produce network outputs are called the "output layer" of nodes. Any node(s) between the input layer and the output layer are collectively called the "hidden layer(s)." In the remainder of this specification, neural networks will be discussed as if they each had only one network output, for simplicity. It is to be understood that all discussions and results apply also to neural networks with more than one network output, unless context demands otherwise.
A node's output is a function of its inputs. In general, a node may be designed to implement any type of mathematical function, so long as the node's computations are "local"--i.e., so long as a node's function has only the node's input values as variable inputs. Typically, interesting neural networks use nodes that implement non-linear functions.
A neural network implements a mapping from the network inputs into a network output. This mapping from inputs to output is determined by the network's topology and locally-acting node functions. A useful attribute of a neural network is that it may be "trained" using training data to learn a desired mapping. Indeed, a typical application for neural networks is the learning and subsequent implementing of a desired mapping from network inputs into a network output. For such an application, a neural network is typically trained under supervision, which means that during training, example network's input data is presented to a neural network trainer along with the corresponding desired network's output for that data. The neural network trainer is adapted to establish values for the function parameters within a neural network's nodes, based on the presented training data. The neural network trainer is adapted to use a training method for establishing function parameter values that cause the neural network to realize the desired mapping to the extent possible. The training method used must be appropriate for the neural network's node function type(s) and topology.
A standard approach to training a neural network's function parameters is to: start with a full set of function parameters, perhaps chosen at random; feed training input data into the neural network to compute an actual (or "observed") output; compute an error which is the difference between the actual and desired network's outputs; and propagate the error backward through the network according to a propagation method to adjust each node's function parameters by a function of the error such that the network, with adjusted function parameters, would produce a smaller error if given the same training inputs again.
In propagating an error backward through the layers of certain neural networks, a problem is encountered in that the effect of the error on a node's function parameter becomes smaller and smaller as the error is propagated through more and more node layers. For deep neural networks (i.e., those with many layers, e.g., greater than about 5 layers), the diminution of the error's effect may become so severe that many nodes within the neural network become effectively untrainable.
What is needed is a neural network and an associated training method that allow nodes within even deep neural networks to be trained effectively.
Conventionally, back-propagation training as described above involves adjusting network parameters for one set of training input values at a time. Adjusting parameters to improve the network's output for one set of input values in general changes the output value for other input values, too. A ramification of this effect is that after a first step of updating brings about a desirable output value for a first set of input values, a later step of updating using another set of input values will change the output value for the first set of input values once again, away from its desirable value. This phenomenon causes the training process to "converge" rather slowly, after using much training data or multiple passes through the same training data.
What is also needed is a method for training a neural network which achieves faster convergence to a final set of network parameters.