Computers of the von Neumann type architecture have limited computational speed owing to the communication limitations of the single processor. These limitations can be overcome if a plurality of processors are utilized in the calculation and are operated at least partly in parallel. This alternative architecture, however, generally leads to difficulties associated with programming complexity. Therefore, it is often not a good solution. Recently, an entirely different alternative that does not require programming has shown promise. The networking ability of the neurons in the brain has served as a model for the formation of a highly interconnected set of analog processors, called a "neural network" or "neural net" that can provide computational and reasoning functions without the need of formal programming. The neural nets can learn the correct procedure by experience rather than being preprogrammed for performing the correct procedure. The reader is referred to R. P. Lippmann's article "An Introduction to Computing With Neural Nets" appearing on pages 4-21 of the April 1987 IEEE ASSP MAGAZINE (0740-7467/87/0400-0004/$10.00 April 1987 IEEE), incorporated herein by reference, for background concerning neural nets.
Neural nets are composed of a plurality of neuron models, processors each exhibiting "axon" output signal response to a plurality of "synapse" input signals. In a type of neural net called a "perceptron", each of these processors calculates the weighted sum of its "synapse" input signals, which are respectively weighted by respective weighting values that may be positive- or negative-valued, and responds non-linearly to the weighted sum to generate the "axon" output response. This relationship may be described in mathematical symbols as follows. ##EQU1##
Here, i indexes the input signals of the perceptron, of which there are an integral number M, and j indexes its output signals, of which there are an integral number N. W.sub.i,j is the weighting of the i.sup.th input signal as makes up the j.sup.th output signal at such low input signal levels that the function ##EQU2## is approximately linear. At higher absolute values of its argument, the function ##EQU3## no longer exhibits linearity but rather exhibits a reduced response to ##EQU4## This type of non-linear response is termed "sigmoidal". The weighted summation of a large number of sampled-data terms can be viewed as the process of correlating the sampled-data function described by those sampled-data terms with the sampled-data function described by the pattern of weights; and the analog processor used as a neuron model in a "perceptron" can be viewed as a correlator with non-linear circuitry exhibiting a sigmoidal response connected thereafter.
A more complex artificial neural network arranges a plurality of perceptrons in hierarchic layers, the output signals of each earlier layer providing input signals for the next succeeding layer. Those layers preceding the output layer providing the ultimate output signal(s) are called "hidden" layers.
In the present-day development of the integrated electronic circuitry art, the weighted summation of a large number of terms, each of which has resolution that would require plural-bit digital sampling, can be done appreciably faster and at less cost in integrated circuit die area by processing in the analog regime rather than in the digital regime. Using capacitors to perform weighted summation in accordance with Coulomb's Law provides neural nets of given size operating at given speed that consume less power than those the analog processors of which use resistors to implement weighted summation in accordance with Ohm's Law. Y. P. Tsividis and D. Anastassion in a letter "Switched-Capacitor Neural Networks" appearing in ELECTRONICS LETTERS, Aug. 27th 1987, Vol. 23, No. 18, pages 958,959 (IEE) describe one method of implementing weighted summation in accordance with Coulomb's Law. Their method, a switched capacitor method, is useful in analog sampled-data neural net systems. Methods of implementing weighted summation in accordance with Coulomb's Law that do not rely on capacitances being switched and avoid the complexity of the capacitor switching elements and associated control lines are also known.
U.S. patent application Ser. No. 366,838 entitled "NEURAL NET USING CAPACITIVE STRUCTURES CONNECTING INPUT LINES AND DIFFERENTIALLY SENSED OUTPUT LINE PAIRS" describes a type of neural net in which each analog synapse input signal voltage drives a respective input line from a low source impedance. Each input line connects via a respective weighting capacitor to each of a plurality of output lines. The output lines are paired, with the capacitances of each pair of respective weighting capacitors connecting a pair of output lines to one of the input lines summing to a prescribed value. A respective pair of output lines is associated with each axonal output response to be supplied from the neural net, and the differential charge condition on each pair of output lines is sensed to generate a voltage that describes a weighted summation of the synapse input signals supplied to the neural net. A respective operational amplifier connected as a Miller integrator can be used for sensing the differential charge condition on each pair of output lines. Each weighted summation of the synapse input signals is then non-linearly processed in a circuit with sigmoidal transfer function to generate a respective axonal output response. This type of neural net is particularly well-suited for use where all input synapse signals are always of one polarity, since the single-polarity synapse input signals may range over the entire operating supply.
U.S. patent application Ser. No. 366,839 entitled "NEURAL NET USING CAPACITIVE STRUCTURES CONNECTING OUTPUT LINES AND DIFFERENTIALLY DRIVEN INPUT LINE PAIRS" describes a type of neural net in which each analog synapse input signal voltage is applied in push-pull from low source impedances to a respective pair of input lines. Each pair of input lines connect via respective ones of a respective pair of weighting capacitors to each of a plurality of output lines. The capacitances of each pair of respective weighting capacitors connecting a pair of input lines to one of the output lines sum to a prescribed value. Each output line is associated with a respective axonal output response to be supplied from the neural net, and the charge condition on each output line is sensed to generate a voltage that describes a weighted summation of the synapse input signals supplied to the neural net. A respective operational amplifier connected as a Miller integrator can be used for sensing the charge condition on each output line. Each weighted summation of the synapse input signals is then non-linearly processed in a circuit with sigmoidal transfer function to generate a respective axonal output response. This type of neural net is better suited for use where input synapse signals are sometimes positive in polarity and sometimes negative in polarity.
U.S. Pat. No. 5,039,871, issued Aug. 13, 1991 to W. E. Engeler, entitled "CAPACITIVE STRUCTURES FOR WEIGHTED SUMMATION, AS USED IN NEURAL NETS" and assigned to General Electric Company describes preferred constructions of pairs of weighting capacitors for neural net layers, wherein each pair of weighting capacitors has a prescribed differential capacitance value and is formed by selecting each of a set of component capacitive elements to one or the other of the pair of weighting capacitors. U.S. Pat. No. 5,039,870, issued Aug. 13, 1991 to W. E. Engeler, entitled "WEIGHTED SUMMATION CIRCUITS HAVING DIFFERENT-WEIGHT RANKS OF CAPACITIVE STRUCTURES" and assigned to General Electric Company describes how weighting capacitors can be constructed on a bit-sliced or binary-digit-sliced basis. These weighting capacitor construction techniques are applicable to neural nets that utilize digital input signals, as will be presently described, as well as being applicable to neural nets that utilize analog input signals.
The neural nets as thusfar described normally utilize analog input signals that may be sampled-data in nature. A paper by J. J. Bloomer, P. A. Frank and W. E. Engeler entitled "A Preprogrammed Artificial Neural Network Architecture in Signal Processing" published in December 1989 by the GE Research & Development Center describes the application of push-pull ternary samples as synapse input signals to neural network layers, which push-pull ternary samples can be generated responsive to single-bit digital samples.
U.S. patent application Ser. No. 546,970 filed Jul. 2, 1990 by W. E. Engeler, entitled "NEURAL NETS SUPPLIED SYNAPSE SIGNALS OBTAINED BY DIGITAL-TO-ANALOG CONVERSION OF PLURAL-BIT SAMPLES" and assigned to General Electric Company describes how to process plural-bit digital samples on a digit-slice basis through a neural net layer. Partial weighted summation results, obtained by processing each bit slice through a neural net layer, are combined in final weighted summation processes to generate final weighted summation results. The final weighted summation results are non-linearly amplified to generate respective axonal output responses. After the weighted summation and non-linear amplification procedures have been carried out in the analog regime, the axonal output responses are digitized, if digital signals are desired in subsequent circuitry.
U.S. patent application Ser. No. 561,404 filed Aug. 1, 1990 by W. E. Engeler, entitled "NEURAL NETS SUPPLIED DIGITAL SYNAPSE SIGNALS ON A BIT-SLICE BASIS" and assigned to General Electric Company describes how to process plural-bit digital samples on a bit-slice basis through a neural net layer. Partial weighted summation results, obtained by processing each successive bit slice through the same capacitive weighting network, are combined in final weighted summation processes to generate final weighted summation results. The final weighted summation results are non-linearly amplified to generate respective axonal output responses. After the weighted summation and non-linear amplification procedures have been carried out in the analog regime, the axonal output responses are digitized, if digital signals are desired in subsequent circuitry. Processing the bit slices of the plural-bit digital samples through the same capacitive weighting network provides good guarantee that the partial weighted summation results track are scaled in exact powers of two respective to each other.
The invention herein described differs from that described in U.S. patent application Ser. No. 561,404 in that the final weighted summation processes used to combine partial weighted summation results, obtained by processing each bit slice through a neural net layer, are carried out in the digital, rather than the analog, regime to generate final weighted summation results. That is, the sampled-data function described by the digital input signals is correlated with the sampled-data function described by each pattern of weights established with the weighting capacitors to generate a respective digital correlation signal. Performing the final weighted summation processes in the digital regime avoids the need for a further array of capacitive structures for performing the final weighted summation; a digital accumulator circuit is used instead, which tends to be more economical of area on a monolithic integrated-circuit die. Also, performing the final weighted summation process in the digital regime has high accuracy since it avoids undesirable non-monotonic non-linearities that can be introduced by inaccuracies in the scaling of the capacitances of weighting capacitors when performing the final weighted summation in the analog regime as described in U.S. patent application Ser. No. 561,404. Performing the final weighted summation processes in the digital regime is particularly advantageous in large neural net layers formed using a plurality of monolithic integrated circuits for processing a set of synapse input signals, since corruption of desired signals by stray pick-up of electrical signals in the interconnections between the monolithic integrated circuits can be remedied if the desired signals are digital in nature.
In neural networks using digital correlators of this new type, the non-linear circuitry exhibiting a sigmoidal response used after each digital correlator has to be digital in nature, rather than being analog in nature as in previous neural networks. Neural nets employing capacitors in accordance with the invention lend themselves to being used in performing parts of the computations needed to implement a back-propagation training algorithm. The determination of the slope of the non-linear transfer function, which determination is necessary when training a neural net layer using a back-propagation training algorithm, can be simply accomplished in certain digital non-linear circuitry as will be described further on in this specification. This contrasts with the greater difficulty of determining the slope of the non-linear transfer function in analog non-linear circuitry.
The back-propagation training algorithm is an iterative gradient algorithm designed to minimize the mean square error between the actual output of a multi-layer feed-forward neural net and the desired output. It requires continuous, differentiable non-linearities in the transfer function of the non-linear circuitry used after the weighted summation circuits in the neural net layer. A recursive algorithm starting at the output nodes and working back to the first hidden layer is used iteratively to adjust weights in accordance with the following formula. EQU W.sub.i,j (t+1)=W.sub.i,j (t)-.eta..delta..sub.j x.sub.i ( 2)
In this equation W.sub.i,j (t) is the weight from hidden node i (or, in the case of the first hidden layer, from an input node) to node j at time t; x.sub.i is either the output of node i (or, in the case of the first hidden layer, is an input signal); .eta. is a gain term introduced to maintain stability in the feedback procedure used to minimize the mean square errors between the actual output(s) of the perceptron and its desired output(s); and .delta..sub.j is a derivative of error. The general definition of .delta..sub.j is the change in error energy from output node j of a neural net layer with a change in the weighted summation of the input signals used to supply that output node j.
Lippman presumes that a particular sigmoid logistic non-linearity is used. Presuming the non-linearity of processor response is to be defined not as restrictively as Lippmann does, then .delta..sub.j can be more particularly defined as in equation (2), following, if node j is an output node, or as in equation (3), following, if node j is an internal hidden node. EQU .delta..sub.j =z.sub.j ' (d.sub.j -z.sub.j) (3) ##EQU5## In equation (3) d.sub.j and z.sub.j are the desired and actual values of output response from the output layer and z.sub.j ' is differential response of z.sub.j to the non-linearity in the output layer--i.e., the slope of the transfer function of that non-linearity. In equation (4) k is over all nodes in the neural net layer succeeding the hidden node j under consideration and W.sub.j,k is the weight between node j and each such node k. The term z.sub.j ' is defined in the same way as in equation (3).
The general definition of the z.sub.j ' term appearing in equations (3) and (4), rather than that general term being replaced by the specific value of z.sub.j ' associated with a sigmoid logistic non-linearity, is the primary difference between the training algorithm as described here and as described by Lippmann. Also, Lippmann defines .delta..sub.j in opposite polarity from equations (1), (3) and (4) above.
During training of the neural net, prescribed patterns of input signals are sequentially repetitively applied, for which patterns of input signals there are corresponding prescribed patterns of output signals known. The pattern of output signals generated by the neural net, responsive to each prescribed pattern of input signals, is compared to the prescribed pattern of output signals to develop error signals, which are used to adjust the weights per equation (2) as the pattern of input signals is repeated several times, or until the error signals are detected as being negibly valued. Then training is done with the next set of patterns in the sequence. During extensive training the sequence of patterns may be recycled.