Computers of the von Neumann type architecture have limited computational speed owing to the communication limitations of the single processor. These limitations may be overcome if a plurality of processors are utilized in the calculation and are operated at least partly in parallel. This alternative architecture, however, generally leads to difficulties associated with programming complexity. Therefore, it is often not a good solution. Recently, an entirely different alternative that does not require programming has shown promise. The networking ability of the neurons in the brain has served as a model for the formation of a highly interconnected set of processors, called a "neural network" or "neural net" that can provide computational and reasoning functions without the need of formal programming. The neural nets can learn the correct procedure by experience rather than being preprogrammed for performing the correct procedure. The reader is referred to R.P. Lippmann's article "An Introduction to Computing With Neural Nets" appearing on pages 4-21 of the April 1987 IEEE ASSP MAGAZINE (07407467/87/0400-0004/$10.00" 1987 IEEE), incorporated herein by reference, for background about the state of the art in regard to neural nets.
Neural nets are composed of a plurality of neuron models, processors each exhibiting "axon" output signal response to a plurality of "synapse" input signals. In a type of neural net called a "perceptron", each of these processors calculates the weighted sum of its "synapse" input signals, which are respectively weighted by respective weighting values that may be positive- or negative-valued, and responds non-linearly to the weighted sum to generate the "axon" output response. This relationship may be described in mathematical symbols as follows. ##EQU1##
Here, i indexes the input signals of the perceptron, of which there are an integral number M, and j indexes its output signals, of which there are an integral number N. W.sub.i,j is the weighting of the i.sup.th input signal as makes up the j.sup.th output signal at such low input signal levels that the function ##EQU2## is approximately linear. At higher absolute values of its argument, the function ##EQU3## no longer exhibits linearity but rather exhibits a reduced response to ##EQU4##
A more complex artificial neural network arranges a plurality of perceptrons in hierarchic layers, the output signals of each earlier layer providing input signals for the next succeeding layer. Those layers preceding the output layer providing the ultimate output signal(s) are called "hidden" layers.
The processing just described normally involves sampled-data analog signals, and prior-art neural nets have employed operational amplifiers with resistive interconnecting elements for the weighting and summing procedures. The resistive elements implement weighted summation being done in accordance with Ohm's Law. The speed of such a processor is limited by capacitances in various portions of the processor, and computations have been slow if the power consumption of a reasonably large neural net is to be held within reasonable bounds. That is, speed is increased by reducing resistance values to reduce RC time constants in the processors, but the reduced resistance values increase the V.sup.2 /R power consumption (R, C and V being resistance, capacitance and voltage, respectively.)
Using capacitors to perform weighted summation in accordance with Coulomb's Law can provide neural nets of given size operating at given speed that consume less power than those the processors of which use conductive elements such as resistors to implement weighted summation in accordance with Ohm's Law.
This metal-oxide-metal construction of capacitors is described in detail by the inventor in his U.S. Pat. No. 3,691,627 issued Sep. 19, 1972, entitled "METHOD OF FABRICATING BURIED METALLIC FILM DEVICES", assigned to General Electric Company and incorporated by reference herein. In the inventor's U.S. Pat. No. 4,156,284 issued May 22, 1979, entitled "SIGNAL PROCESSING APPARATUS" and assigned to General Electric Company the use of a metal-oxide-metal construction of capacitors in the construction of arrays of weighting capacitors in an MOS integrated circuit is described in connection with apparatus for performing matrix multiplication.
Y.P. Tsividis and D. Anastassion in a letter "Switched-Capacitor Neural Networks" appearing in ELECTRONICS LETTERS, Aug. 27, 1987, Vol. 23, No. 18, pages 958, 959 (IEE) describe one method of implementing weighted summation in accordance with Coulomb's Law. Their method, a switched capacitor method, is useful in analog sampled-data neural net systems. However, a method of implementing weighted summation in accordance with Coulomb's Law that does not rely on capacitances being switched is highly desirable, it is here pointed out. This avoids the complexity of the capacitor switching elements and associated control lines. Furthermore, operation of the neural net with continuous analog signals over sustained periods of time, as well as with sampled data analog signals, is thus made possible.
A problem that is encountered when one attempts to use capacitors to perform weighted summation in a neural net layer is associated with the stray capacitance between input and output lines, which tends to be of appreciable size in neural net layers constructed using a metal-oxide-semiconductor (MOS) integrated circuit technology. The input and output lines are usually laid out as overlapping column and row busses using plural-layer metallization. The column busses are situated in one layer of metallization and the row busses are situated in another layer of metallization separated from the other layer by an intervening insulating oxide layer. This oxide layer is thin, so there is appreciable capacitance at each crossing of one bus over another. The fact of the row and column busses being in different planes tends to increase stray capacitances between them. The stray capacitance problem is also noted where both row and column busses are situated in the same metallization layer with one set of busses being periodically interrupted in their self-connections to allow passage of the other set of busses and being provided with cross-over connections to complete their self-connections. The problem of stray capacitance is compounded by the fact that the capacitive elements used to provide weights in a capacitive voltage summation network have stray capacitances to the substrate of the monolithic integrated circuit in which they are incorporated; a perfect two-terminal capacitance is not actually available in the monolithic integrated circuit. Where capacitive elements having programmable capacitances are used, capacitance is usually not programmable to zero value, either.
The problems of stray capacitance are solved in the invention by using output line pairs and sensing the charge conditions on the output lines of each pair differentially so that the effects of stray capacitances tend to cancel each other out. These output line pairs facilitate both excitory and inhibitory weights--that is, both positive- and negative-polarity W.sub.i,j --in effect to be achieved without having to resort to capacitor switching to achieve negative capacitance.
Neural nets employing capacitors in accordance with the invention lend themselves to being used in performing parts of the computations needed to implement a back-propagation training algorithm. The back-propagation training algorithm is an iterative gradient algorithm designed to minimize the mean square error between the actual output of a multi-layer feed-forward neural net and the desired output. It requires continuous, differentiable non-linearities. A recursive algorithm starting at the output nodes and working back to the first hidden layer is used iteratively to adjust weights in accordance with the following formula. EQU W.sub.i,j (t+1)=W.sub.i,j (t)-.eta..delta..sub.j x.sub.i ( 2)
In this equation W.sub.i,j (t) is the weight from hidden node i (or, in the case of the first hidden layer, from an input node) to node j at time t; x.sub.i is either the output of node i (or, in the case of the first hidden layer, is an input signal); .eta. is a gain term introduced to maintain stability in the feedback procedure used to minimize the mean square errors between the actual output(s) of the perceptron and its desired output(s); and .delta..sub.j is a derivative of error. The general definition of .delta..sub.j is the change in error energy from output node j of a neural net layer with a change in the weighted summation of the input signals used to supply that output node j.
Lippman presumes that a particular sigmoid logistic non-linearity is used. Presuming the non-linearity of processor response is to be defined not as restrictively as Lippmann does, then .delta..sub.j can be more particularly defined as in equation (2), following, if node j is an output node, or as in equation (3), following, if node j is an internal hidden node. ##EQU5## In equation (3) d.sub.j and y.sub.j are the desired and actual values of output response from the output layer and y.sub.j ' is differential response of y.sub.j to the non-linearity in the output layer--i.e., the slope of the transfer function of that non-linearity. In equation (4) k is over all nodes in the neural net layer succeeding the hidden node j under consideration and W.sub.j,k is the weight between node j and each such node k. The term y.sub.j ' is defined in the same way as in equation (3).
The general definition of the y.sub.j ' term appearing in equations (3) and (4), rather than that general term being replaced by the specific value of y.sub.j ' associated with a sigmoid logistic non-linearity, is the primary difference between the training algorithm as described here and as described by Lippmann. Also, Lippmann defines .delta..sub.j in opposite polarity from equations (1), (3) and (4) above.
During training of the neural net, prescribed patterns of input signals are sequentially repetitively applied, for which patterns of input signals there are corresponding prescribed patterns of output signals known. The pattern of output signals generated by the neural net, responsive to each prescribed pattern of input signals, is compared to the prescribed pattern of output signals to develop error signals, which are used to adjust the weights per equation (2) as the pattern of input signals is repeated several times, or until the error signals are detected as being negibly valued. Then training is done with the next set of patterns in the sequence. During extensive training the sequence of patterns may be recycled.