1. Field of the Invention
The present invention pertains to the field of pattern recognition. More particularly, the invention pertains to adaptive pattern recognizers capable of supervised or unsupervised learning.
2. Background of the Invention
Artificial neural networks having parallel, distributed processing have been widely proposed and have been simulated by serial or partially parallel, digital systems. It has long been recognized that fully parallel distributed processing would offer advantages such as extremely fast overall processing with relatively slow elements, effective functioning with deficient or even inoperable elements, recognition of patterns by learning from prototypes without programming, and little functional degradation by noisy input.
Some well-known neural network models will now be described as background for their implementation by the present invention. Typically, neural networks are modeled in a layered architecture. A first group of neurons or neural units serves as an input layer and propagates signals to another group in an intermediate layer. Signals from the intermediate layer then influence one or more succeeding layers.
Ultimately, signals are provided by an output layer which may be the second or, commonly, the third layer. In the general case every signal from one layer is connected to an input or synapse of the next layer, and "learning" is considered to occur by varying at each synapse the "weight" given to the signal received thereat. Each synapse may have an excitatory or inhibitory effect. All of the weighted signals received by the synapse of a neuron determine its activation, or output signal strength, in accordance with some function. The function is, typically, assumed to be a sigmoid, or S-shaped, with a lower excitatory threshold below which to significant activation occurs with reduction in excitatory input and with an upper excitatory threshold above which there is no significant increase in activation with increase in excitatory input.
It is apparent that such a network may be defined mathematically with signals and neurons of connected layers identified by subscripts corresponding to the layers. It is also apparent that each layer may be represented as a rectangular array with each synapse identified by Cartesian coordinates specifying the neuron and the particular synapse of a neuron. Using this approach, the activation of a neuron j is represented by a real number O.sub.j. The effectiveness or weight of the synaptic connection from neuron j to neuron i is also represented by a real number w.sub.ij which may varied as the network is "taught" or "learns". The input to neuron i (I.sub.i) is then .SIGMA.w.sub.ij O.sub.j. The output (O.sub.i) of the neuron is a function of its inputs as represented by
O.sub.i =F(I.sub.i)=F(.SIGMA.w.sub.ij O.sub.j),
where F is the above-mentioned sigmoidal function.
It is well-known that a neural network provided with predetermined input signals may be taught to provide a corresponding and desired output signal by addressing each weight and varying it with an appropriate amount .DELTA.w.sub.ij.
It is also known that a neural network may itself "learn" if .DELTA.w.sub.ij of each synapse is appropriately determined by the input O.sub.j to the synapse and by either the output O.sub.i of the corresponding neuron or by an "error" signal .delta..sub.i back propagated from the layer to which O.sub.i is propagated.
Thus in "Hebbian" learning if O.sub.j and O.sub.i are both active, the w.sub.ij is increased. Mathematically, this may be represented by
.DELTA.w.sub.ij =G O.sub.i O.sub.j,
where G is a gain term used to control the rate of learning.
However, in "delta rule" learning the weight at each synapse is modified as represented by
.DELTA.w.sub.ij =G .delta..sub.i O.sub.j,
where G is as above, .delta..sub.i is the back propagated error signal to the ith level neuron with the synapse, and O.sub.j is the jth level output thereto. Error signals are then recursively back propagated as represented by
.delta..sub.j =F'(I.sub.j) .SIGMA.w.sub.ij .delta..sub.i
where .delta..sub.j is the error signal to the neuron providing the O.sub.j ; where F'(I.sub.j) represents the derivative of the above described function F of the jth neuron applied to the sum of the weighted inputs thereto: and .SIGMA.w.sub.ij .delta..sub.i is the sum of the weights of each ith synapse applied to the corresponding error signals propagated thereto from the layer to which the ith layer propagates. At each neuron, the argument of F for forward propagation and F' for backward error propagation is the same.