The present invention relates to a circuit employing logical gates for calculating activation function derivatives on stochastically encoded signals.
One of the most popular and successful neural network learning algorithms is backpropagation.sup.1 which has had a wide following, and a great number of application successes. It is natural that VLSI neural chip designers should address the design of a backpropagation chip. While theories underlying many other network algorithms are sometimes vague, ill-defined, and hence operation is based on heuristics, backpropagation has a firm computational foundation.
A typical backpropagation network consists of an input layer, a hidden layer, and an output layer. FIG. 1 shows a standard three-layer backpropagation network, having three input units, two hidden units, and three output units.
Each neuron in the network takes as its input the sum of the outputs of pre-synaptic neurons, each weighted by the connection linking the neurons, to form a net activation, called net. The neuron emits an output that is a monotonically increasing, nonlinear differentiable function f(net), often chosen to be a sigmoid--1/(1+e.sup.-.beta.net). Thus, for any unit j, we have ##EQU1## where the summation is over the outputs of each presynaptic neurons.
Each neuron's output is a nonlinear function of net: EQU o.sub.j =f(net.sub.j) [2]
The goal is to have the network produce the desired output when an appropriate input is presented. The network learns from a large set of pairs of input and desired output patterns, where the desired output patterns are provided by a teacher (supervised learning).
One way to express the goal of the network is to form a global error function, ##EQU2## this term itself summed over all input-desired output pattern pairs. Training the network consists of setting all the weights so as to reduce the global error. During training, each weight w.sub.hi is changed by an amount given by the general form: EQU .DELTA.w.sub.ji =n.delta..sub.j o.sub.i [ 4]
where .eta. is a small learning rate constant, o.sub.i is the presynaptic output activity, and .delta..sub.j is a local error, computed at each neuron. Given the definition ##EQU3## it can be shown that for an output unit (labelled k in FIG. 1), this local error is: EQU .delta..sub.k =(t.sub.k -O.sub.k)f'(net.sub.k) [6]
where t.sub.k is the signal provided by the teacher to the output unit. For a hidden unit (labelled j in FIG. 1), which has no direct teaching signal, the local error is: ##EQU4## where .delta..sub.k is the local error at each of the output neurons.
With the local errors calculated this way, and with a few very lax conditions met, the network acts to reduce its global error upon each learning trial. Thus the learning can be described as an error gradient-descent in weight space.
Note very carefully, however, that f'(net) appears in equations [5]and [7]. Thus f'(net) must be calculated at every output and hidden neuron in order that the backpropagation algorithm proceed. Although there are some application problems for which f'(net) might not need to be calculated, these are rare, and are very simple problems, where the power of backpropagation is not necessary.