1. Field of the Invention
The present invention pertains to the field of pattern recognition. More particularly, the invention pertains to the field of adaptive pattern recognizers capable of supervised or unsupervised memory modification or learning.
2. Description of the Belated Art
Artificial neural networks having parallel, distributed processing are recognized as a way to provide extremely fast overall processing with relatively slow elements, pattern recognition by learning from prototypes with subsequent minimal functional degradation due to noisy inputs, and effective functioning despite deficient or inoperable elements, this last feature resulting in high yields in integrated circuits implement such networks. Accordingly, extensive theoretical research and digital simulations have been performed in connection with artificial neural networks. However, the practical application of these networks depends on the development of memory modification methods that are effective with practical and mass producible integrated circuits having vast numbers of elements functioning in parallel.
An artificial neural network contemplated by the prior art is depicted conceptually and conventionally in FIG. 1, and has three "neurons" 10, each providing an output 11 of the network, and four inputs 12, each connected to each neuron by a "synapse" 13. This network and others subsequently described herein are, to make exposition possible, much simpler than any practically useful artificial neural network integrated circuit having, for example, sixty-four neurons each with one hundred and twenty-eight inputs. The complexity of the memory modification arrangements described herein for an artificial neural network should, therefore, be considered as the network might be implemented with thousands of synapses rather than the twelve synapses in FIG. 1.
Each synapse 13 is depicted as having a variable resistor 15 to indicate that a signal, typically a voltage signal, on each input 12 provides a current to a summing circuit 16 of each neuron 10. This current is proportional to the strength of the input signal and a "weight" which is the conductance of the synapse as determined by resistor 15. The output of circuit 16 is provided to a function generator 18 which drives the corresponding output 11 in accordance with a predetermined relation or function between the sum of the input signals and a, typically voltage, output signal. It is well-known that by suitable adjustments to the weights, that is, modification of the "memory" of the network occurring when the memory "learns", the output signals may be made to assume desired values corresponding to predetermined values of the input signals. The value of an artificial neural network is thus dependent on the practicality of its arrangements for memory modification as embodied in an actual circuit
It is contemplated that an artificial neural network, such as that of FIG. 1 but having many more elements as mentioned above, is utilized as only one "layer" of a more complex network in which the outputs, such as 11, of one layer are the inputs, such as 12, of a similar layer. The practicality of embodying an artificial neural network memory modification method in an integrated circuit thus depends on the connecting elements necessary between layers as well as on the connecting elements to the synapses, such as 13, of each layer. The complexity of the memory modification arrangements described herein for a one layer artificial neural network should, therefore, be considered with the network connected as one layer of several layers each with thousands of synapses. Typically, signals generated in accordance with such a method to modify the weights of an artificial neural network do not correspond to new values to be substituted for the existing values of the weights, but to linear changes to be added algebraically to such existing values. There are many well-known, as well as contemplated, circuits for so modifying such a weight and the particular such circuit employed in an artificial neural network embodying the present memory modification method or other such methods depends on many considerations including the manner of implementing the weights, which may be an actual variable resistance of any kind, a digitally switched circuit, a change in the gate charge of a field effect transistor, or some combination of such elements
Accordingly, FIG. 1 depicts such weight modification circuits of each synapse 13 schematically and for representative purposes Each such circuit has a multiplier 20 having one input 21 from the network input 12 corresponding to the synapse since the weight modification is, typically, proportional to the degree of activation of the synapse input. Multiplier 20 has another input 22 for other factors, subsequentially described in detail, used to modify the synapse weight and provided from outside the network by a teaching system. It should be noted that inputs 12 may be provided to a teaching system, directly by bypassing the network or by multiplexing inputs 12, with the function of multiplier 20 performed by the teaching system and the results provided to the synapses corresponding to synapses 13 through multiplexing. The product output of multiplier 20 is provided to a setting circuit 24 of any suitable construction for adjusting the corresponding resistor 15 by an amount determined by the multiplier output.
The network of FIG. 1 is depicted, also schematically and for representative purposes, as having an error calculator 30 which receives the outputs 11 and also receives individually corresponding desired outputs 32 having values from which the network is to "learn". Each pair of corresponding outputs 11 and 32 is provided to a difference circuit 34 whose output represents the signed difference between the values of this pair and may be provided externally of the error calculator at a terminal 35.
Each difference is provided to a squaring circuit 36 which outputs to a summing circuit 37 which sums the squared differences and provides a corresponding signal to a square root circuit 38 whose output signal 39, typically termed simply "error", represents the overall error of the network. As determined in FIG. 1, signal 39 corresponds to the root-mean-square (RMS) of the individual neuron errors with the associated and well-known statistical and theoretical properties of the RMS error. However, an error calculator corresponding to calculator 30 may provide some other function corresponding to the absolute values of differences between each neuron output and its desired output.
Two well-known memory modification methods proposed for an artificial neural network are, first, the "delta rule" and, second, a method previously described in the prior art in connection with a multiple adaptive linear neuron or "Madeline" and, therefore, referred to herein as the "Madeline method". Both of these methods are the subject of extensive theoretical study and of digital simulation. In memory modification by these methods, the general approach is to provide the inputs of a network, which is adapted to use the method, with successive patterns and, at each presentation, to determine the errors between the values of such network output signals and their desired values. The errors are then used in accordance with the method to modify the weights.
The delta rule, which employs the well-known "steepest descent" technique, may be carried out by an artificial neural network constructed as shown in FIG. 2, this network being similar to that of FIG. 1 but having additional elements for implementing the delta rule. In regard to the similarities, the FIG. 2 network has three neurons 50 providing individual outputs 51 and receiving four inputs 52 to the network at four synapses 53 of each neuron with the synapse outputs being summed by circuits 55. The neuron outputs are provided to an error calculator 56 which receives three desired outputs 57 and provides three corresponding difference signals 58 and an overall error signal 59 available to a teaching system, not shown.
With the delta rule, the above mentioned function is necessarily differentiable and monotonically nondecreasing, and, typically, is a sigmoid or "S" shaped function as indicated by numeral 65 at three function generators 66 which correspond to function generators 18 and individually receive the outputs of circuits 55. In accordance with the delta rule, the FIG. 2 network is adapted for backpropagation through it of error signals derived from the errors between the desired network output signals and the learning output signals.
More specifically, the error signal corresponding to each neuron 50 is derived by a second function generator 70 thereof which receives the output of the summing circuit 55 and itself provides an output 71 corresponding to the derivative, indicated by numeral 72, of the sigmoidal function 65. The output 71 is, functionally, provided to a multiplier circuit 73 along with a backpropagated error signal 74, which may be from another artificial neural network layer, not shown, to which outputs 51 are propagated or, for a final such layer as shown in FIG. 2, may be the error signal 58 corresponding to the error between the neuron output 51 and a desired final output 57 of the network. The output of circuit 73, which thus combines the effect of the backpropagated error with the effect of a change in the contribution of the neuron 50, is typically multiplied, as represented by another multiplier circuit 75, by a teaching system provided gain signal 77, and the resulting product is transmitted as a signal on a conductor 78 to each synapse 53 of the neuron.
In each neuron 50, each synapse 53 has, similarly to FIG. 1, a variable weight or resistor 80 connecting an input conductor 81, which extends through all synapses corresponding to an input 52, to a summing conductor 82 which runs through all the synapses of the neuron to its summing circuit 55. Each synapse 53 also has a variable weight or resistor 83 connecting the corresponding conductor 78 to an error summing conductor 84, which extends through all the synapses corresponding to an input 52, to an error summing circuit 85, similar to a circuit 55, which corresponds to the input and provides an output 86 which is connected, in accordance with the delta rule, to a preceding artificial neural network layer as one of the backpropagated error signals 74.
At each synapse 53 the backpropagated signal on the corresponding conductor 78 is also provided to a multiplier 90 which corresponds to multiplier 20 in receiving the input signal corresponding to the synapse and in outputting to a setting circuit 91. Circuit 91 corresponds to circuit 24 and, in accordance with the delta rule, modifies both weights 80 and 83 at the same time proportionally to the product from multiplier 90.
It is apparent that a circuit implementing the delta rule is disadvantageous in its requirements of a substantial number of backpropagation connections and of the comparatively bulky and power consuming additional elements required for the additional function generator 70 of each neuron 50 and additional variable weight 83 of each synapse 53. A further serious disadvantage of the delta rule is that, although an artificial neural network can, in general, "learn around" a deficient or inoperative synapse or function generator, simulations have shown that the backpropagated delta rule error signals, such as those on conductors 78 and 81 and from output 86, must substantially attain a value representing zero error for convergence during memory modification by the delta rule. However, typical errors in the order of one percent in such signals due to presently attainable tolerances in the construction of the very large scale integrated circuit (VLSIC) necessary for practical application of artificial neural networks, are, as shown by such simulations, sufficient to prevent the degree of convergence required for many applications of artificial neural networks. The backpropagation required by the delta rule is also highly disadvantageous due to possible crosstalk between forwardly propagated signals and the backpropagated error signals which are of low magnitude as convergence is approached. It is apparent from FIG. 2 that pairs of conductors for these signals, such as the pair 78 and 82 and the pair 81 and 84, are adjacent in an actual integrated circuit as well as conceptually and cannot be spaced apart without limiting the number of neurons that can be constructed on one VHLIC. Crosstalk between back and forward propagated signals may also occur between neural network layers due to the proximity of signals to inputs 52 and from error signal outputs 86.
The proposed prior art Madeline memory modification method may be implemented by an artificial neural network and associated teaching elements constructed as shown in FIG. 3, the depicted arrangement of the network and teaching elements being conceptual since it is believed that no such network or teaching arrangement suitable for integrated circuit implementation has been constructed. The FIG. 3 network is similar to that of FIG. 1, but has additional elements for implementing Madeline memory modification. In regard to the similarities, the FIG. 3 network has three neurons 100 providing individual outputs 101 and receiving four inputs 102 to the network at four synapses 103 of each neuron with output signals of the synapses being summed by a circuit 104 and provided to a function generator 105. The neuron outputs are provided to an error calculator 106 which receives three desired outputs 107 and provides an overall error signal 109.
The network of FIG. 3, for each synapse 103 and similarly to the FIG. 1 network, a variable weight 110 adjusted by a setting circuit 111 in accordance with the product from a multiplier 112 of the corresponding input 102 and a variation signal, this latter signal being provided to the synapses of each neuron 100 by a conductor 115 extending through these synapses. Each neuron also has a summing conductor 117 extending through the synapses thereof and connecting the weights 110 of these synapses to the summing circuit 104 of the neuron.
The network of FIG. 3 has a multiplexer 120 receiving an address signal 121 and providing select signals 122 individually to the neurons 100. The network has a weight modification signal input 125 and a perturbation signal input 127 individually provided to each neuron through a pair of switches 128, which close on the signal 122 for the neuron, and connect the perturbation signal to the summing conductor 117 of the neuron and to connect the variation conductor 115 to the modification signal input 125.
In the network of FIG. 3, the function provided by each generator 105 is, in accordance with the Madeline method, a non-differentiable "signum" or step function, which is indicated by numeral 130 and implies that the output of each neuron 100 can only be in either one of two states, typically, the binary states of completely on or completely off. The neuron output thus undergoes a sudden transition or "flipping" between the two states at some level of output from the summing circuit 104 of the neuron.
In each cycle of learning by this method, it is necessary to determine the one of the neurons 100 in the network having the input to the function generator 105 of the neuron from the corresponding summing circuit 104 such that this generator is the one in the network nearest to flipping. The weights of this neuron are then adjusted and the cycle is repeated. It is known to implement the Madeline method by, first, perturbing the current to the summing circuits 104 one at a time to determine this one neuron. Second, the overall error of the network is determined before and after such a perturbation in current to a selected summing circuit. Third, if this overall error is decreased by the perturbation, the weights of the perturbed neuron are adjusted by an amount proportional to the reduction in error. The Madeline method may also involve finding several neurons closest to flipping and then perturbing those neurons.
The Madeline method appears deficient, as shown by simulations attempting recognition of relatively complex patterns with multiple layer artificial neural networks, in that convergence to only a relatively large error is possible. To some extent, this is believed due to the large output shift occurring with the signum function 130. However with complex patterns and multiple layers, simulations directly substituting the sigmoid function 65 of FIG. 2 for signum 130 and using the Madeline approach of finding a neuron or neurons closest to changing and then perturbing these neurons also does not converge to as small an error as is possible with memory modification using the above-mentioned steepest descent technique implemented by delta rule simulation.
It is apparent that an integrated artificial neural network embodying the Madeline method would, as shown in FIG. 3, be complicated by the need to bring out from the network signals corresponding to the currents from each of the summing circuits 104 for measurement to determine which neurons are nearest to changing. The necessary scanning and comparison of the neurons, such as neurons 100, to detect the one or a few thereof nearest to changing would also be relatively time consuming and require relatively complex arrangements suggested in FIG. 3 where, at each neuron, the corresponding select signal 122 is provided to a switch 135 which is closed by this signal so as to connect the output of the summing circuit 104 to a conductor 136 providing this output externally of the FIG. 3 network.
The arrangements for Madeline memory modification suggested in FIG. 3 also include a controller 140 represented as a labeled rectangle and providing a search period signal 141, a perturb period signal 142, an adjust period signal 143, and a save signal 144 which are exerted sequentially during a memory modification cycle as depicted on the rectangle. Error signal 109 is provided to a buffer 145 which is loaded by signal 144 at the end of each cycle and thus retains the error for the next cycle. A difference circuit 147 receives this retained error and signal 109 and provides a signal 148 representing the change in error or "delta error". Error signal 109 may be provided externally and may be compared with an externally provided desired error signal 151 to generate a signal 153 which causes controller 140 to run when the error is less than the desired error. To control memory modification, a signal 155 controlling the amount of gain and a signal 156 controlling the amount of perturbation are externally provided. The product of gain signal 155 and delta error signal 148 is divided by perturb signal 156 to provide a weight variation signal 158. Perturb amount signal 156 is provided to the FIG. 3 network as signal 127 by a switch 161, which is closed by perturb period signal 142, and by a current generator 162 which converts this signal to a current level appropriate for the summing circuits 104. Signal 158 is provided to the network as signal 125 by a switch 163 closed by adjust period signal 143.
Elements similar to those described in the previous paragraph may be useful for other memory modification methods than the Madeline method; however, the search for a neuron nearest to changing is unique to the Madeline method and if implemented in hardware requires elements similar to those now to be described with reference to FIG. 3. During the search period defined by signal 141, a counter 165, whose output is directed to the network as signal 121 by a multiplexer 166, successively addresses neurons 100. The outputs of summing circuits 104 are thus delivered successively by switches 135 and through conductor 136 to an absolute difference circuit 167 which receives an external signal 168 corresponding to the input level of function generator 105 at which transition of the output thereof occurs. (In practice the tolerances in VLSIC construction and operation are such that the actual transition input level may not be known, requiring iterative variation in signal 168 with examination of the output of function generator 105 at each iteration to determine whether transition has occurred.) The difference between signal 168 and the output of circuit 104 of an addressed neuron 100 is provided to a buffer 170 which is loaded by a signal 172 from comparator 173 whenever such a difference is lesser than a previous such difference stored in buffer 170. Signal 172 is also provided to load an address buffer 175 with the address, from counter 165, of the neuron having such lesser difference. When the search period is over, buffer 175 by way of multiplexer 166 addresses the neuron nearest to transition during the following perturbation and weight adjustment periods.
It is evident from comparison of FIG. 2 with FIG. 3 that the Madeline method, FIG. 3, avoids the delta rule, FIG. 2, backpropagation of error signals at the expense of arrangements, as suggested by the description in the previous paragraph, for determining, in an artificial network having signum function generators, the neuron or neurons nearest to transition. However, because of the before stated difficulties in convergence, the Madeline method is not an effective alternative for many proposed applications of artificial neural networks. It is, therefore, apparent that an artificial neural network memory modification method which provides convergence to minimal error, which is effective with multiple layer neural networks, and which can be effectively implemented with VLSI techniques without excessive bulk and expense would be highly advantageous.