1. Field of the Invention
The present invention generally relates to semiconductor neural networks, and more particularly, to a simple structure of coupling matrix which can give substantially multi-valued expression to coupling strength of each coupling element in the coupling matrix.
2. Description of Background Art
In recent years, a variety of circuits modeled on a neuron of a human being has been contrived. Among such neuron models, there is one called a Hopfield's model. This Hopfield's model will be briefly described below.
In FIG. 1, there is shown a schematic structure of a unit modeled on a neuron. A unit i comprises an input portion A for receiving signals from other units k, j, l and the like, a converting portion B for converting applied inputs according to a certain rule, and an output portion C for outputting the conversion results.
The input portion A has a weight (synapse load) W for each input unit which indicates a coupling strength between the units. For example, a signal Sk from the unit k is loaded with a weight Wik before transmitted to the converting portion B. This weight W can take any of positive and negative values or 0.
The converting portion B make a total sum "net" of inputs S that have been loaded with the weights W undergo a predetermined function f for output. Therefore, output Si from the unit i at the time t is given as: ##EQU1## As the function f, a threshold function shown in FIG. 2A or a sigmoid function shown in FIG. 2B is often used.
The threshold function shown in FIG. 2A is a unit step function having characteristics that when the total sum "net (i)" of inputs becomes larger than a threshold value (.theta.), logical "1" is outputted, and when it does not reach the threshold value, logical "0" is outputted.
The sigmoid function shown in FIG. 2B is a non-linear monotonously increasing function and given by the following expression: EQU f=1/[1+exp(-net(i))].
The range of values of the sigmoid function is from 0 to 1. Therefore, as the total sum "net (i)" of inputs becomes smaller, the output approaches to "0", and as the total sum "net (i)" of inputs becomes larger, the output approaches to "1". when the total sum "net (i)" of inputs is "0", this sigmoid function outputs "0.5".
Another function obtained by adding a threshold value .theta. to the above-mentioned sigmoid function, as given by the following expression, may be employed. EQU f=1/[1+exp(-net(i)+.theta.)]
The neuron unit shown in FIG. 1 is modeled on a vital cell which receives stimuli from other neurons and fires when a total sum of the stimuli exceeds a certain value. The Hopfield's model provides an operational model to a network configured of a plurality of such neuron units.
In the expressions above, when one neuron is initialized, the state of all the following neuron units is determined in principal by applying the above-mentioned two dynamic equations to each neuron unit and solving them simultaneously.
When the number of units increases, however, it is almost impossible to investigate and catch hold of state of one unit after another, and to program weights and bias values such that an optimal solution can be provided for a target problem. Therefore, Hopfield has introduced, in place of state of each unit, an energy function E as a quantity for representing entire characteristics of a neural net, which is defined as follows. ##EQU2## in the expression above, Ii is a self-bias value specific to the unit i.
Hopfield has demonstrated that when the weight (synapse load) Wij has a symmetry shown as Wij=Wji, each unit changes its own state such that the above-mentioned energy function E always takes minimum values (more correctly, local minima), and proposed this model be applied to programming of the weight Wij. A model according to the energy function E as described above is called a Hopfield's model.
The expressions above are often restated for a discrete model as: ##EQU3## In the expression above, n is a discrete time. Hopfield himself has demonstrated that the Hopfield's model above can work with good accuracy especially when the function f indicating input/output characteristics has a steep gradient (which is approximate to a unit step function in which most of the outputs take values close to either "0" or "1").
Neural networks have been configured according to this Hopfield's model in VLSI (Very Large Scale Integration) technology. One example of such a neural network is disclosed in "Computer" March, 1988, pp. 41 to 49, published by IEEE (Institute of Electrical and Electronics Engineers).
In FIG. 3, there is shown the entire schematic structure of a conventional integrated neural network circuit, which is disclosed by H. P. Graf in the article titled "A CMOS Associative Memory Chip Based on Neural Network", ISSCC 87, Digest of Technical Papers, 1987 February, pp. 304 and 305. Referring to FIG. 3, the conventional integrated neural network circuit comprises a resistive matrix 100 having resistive coupling elements with predetermined weights arranged in a matrix, and an amplifying circuit 101 for amplifying potentials on data input lines (not shown) included in the resistive matrix 100 and feeding back those amplified signals to input portions of the resistive coupling elements. The resistive matrix 100 comprises the data input lines and data output lines arranged in a direction orthogonally intersecting the data input lines, as will be described in detail data. Interconnections between the data input lines and the data output lines made through the resistive coupling elements are programmable.
To program state of each resistive coupling element (or interconnection state between a data input line and a data output line) contained in the resistive matrix 100, there are provided a row decoder 102 and a bit decoder 103. The row decoder 102 selects one row of coupling elements in the resistive matrix 100. The bit decoder 103 selects one column of coupling elements in the resistive matrix 100.
For data input/output, there are provided an input/output data register 104 for temporarily latching input/output data, a multiplexer 105 for connecting the input/output data register 104, according to write/read mode of the data, to either the data input lines or the data output lines in the resistive matrix 100, an interface (I/O) 106 for connecting the input/output data register 104 to the outside of the device. This neural network is integrated on a semiconductor chip 200. In FIG. 4, there is shown a structure of the resistive matrix 100 in FIG. 3, which is disclosed in the above mentioned ISSCC article by H. P. Graf.
Referring to FIG. 4, the resistive matrix 100 comprises data input lines A1 to A4 and data output lines B1 and B1, B2 and B2, B3 and B3, and B4 and B4. At the connections between the data input lines A1 to A4 and the data output lines B1 and B1 to B4 and B4, there are provided resistive coupling elements 1 each for coupling a data input line to a corresponding data output line. Each coupling element 1 can take three states; open or don't care state, excitatory state and inhibitory state. The state of each resistive coupling element 1 can be externally programmed according to an applied problem. Though in FIG. 3, those resistive coupling elements 1 that are in the open state are not shown, all the connections between the data input lines and the data output lines are provided with the resistive coupling elements 1. Each resistive coupling element 1 transmits, according to its own programmed state, potential level on the corresponding data output line onto the corresponding data input line.
For the input lines A1 to A4, there are provided inverting amplifiers 2-1 to 2-8 for amplifying data signals on the corresponding data input lines and transmitting the amplified signals to the corresponding data output lines. Two series-connected inverting amplifiers serve as a single amplifier unit Ci (i=1 to 4) for a single data input line Ai (i=1 to 4).
The inverting amplifier 2-1 inverts potential on the input line A1 and transmits the inverted potential onto the output line B1. The inverting amplifier 2-2 amplifies the potential on the input line A1 and transmits the amplified potential onto the output line B1. The inverting amplifier 2-3 inverts signal potential on the input line A2 and transmits the inverted potential onto the output line B2, and the inverting amplifier 2-4 transmits the signal potential on the data input line A2 onto the output line B2. The inverting amplifiers 2-5 and 2-6 transmit signal potential on the data input line A3 onto the data output lines B3 and B3 in the inverted and non-inverted states, respectively. The inverting amplifiers 2-7 and 2-8 transmits signal potential on the data input line A4 onto the data output lines B4 and B4 in the inverted and non-inverted states, respectively.
Each coupling element couples a data output line to a data input line with a specific coupling strength. In other words, this means that output of one amplifier is connected to input of another amplifier. An example of structure of the coupling element 1 is shown in FIG. 5, which is also disclosed in the above-mentioned ISSCC article by H. P. Graf.
Referring to FIG. 5, the resistive coupling element 1 comprises resistive elements R+ and R-, switching elements S1, S2, S3 and S4, and random access memory cells 150 and 151. The resistive element R+ and has one terminal connected to a supply potential V.sub.DD. The resistive element R- has one terminal connected to another supply potential V.sub.SS. The switching element S1 is controlled by output of an inverting amplifier 2b for its on/off. The switching element S2 is turned on/off according to information stored in the random access memory cell 150. The switching element S3 is set in the on/off state according to information stored in the random access memory cell 151. The switching element S4 is controlled by output of another inverting amplifier 2a for its on/off. The random access memory cells 150 and 151 can be externally programmed for their output states (storage information) in advance and, therefore, also the switching elements S2 and S3 can be previously programmed for their on/off.
In the structure shown in FIG. 5, an amplifying circuit Cj (a circuit constituted of the inverting amplifiers 2a and 2b) only controls with output the switching elements S1 and S4 for their on/off and does not directly supply current to a corresponding data input line Ai, thereby reducing output load capacitance of its own. The resistive elements R+ and R- are current limiting resistors.
The coupling element 1 can take three states according to programmed states (or storage information) of the random access memory cells 150 and 151. That is, an excitatory coupling state where the switching element S2 is in on the state (active state), an inhibitory coupling state where the switching element S3 is in the on state (active state), and an open coupling state where both switching elements S2 and S3 are in the off state (non-active state). When potential levels on output lines Bj and Bj of the amplifying circuit Cj coincide with a programmed coupling state of a certain resistive coupling element 1, current flows through a corresponding data input line Ai either from the supply potential V.sub.DD or from the other supply potential (for example, ground potential) V.sub.SS. When the programmed coupling state of the resistive coupling element 1 is open, no current flows through the input line Ai irrespective of output state of the amplifying circuit Cj.
When the above-mentioned circuit model is compared with a neuron model, the amplifying circuit corresponds to a neuron body (the converting portion in FIG. 1). The input lines A1 to A4 and the output lines B1 to B4 and B1 to B4 correspond to the data input/output line structure (dendrite and axon) shown in FIG. 1. The resistive coupling element 1 corresponds to a synapse loading portion which provides weighting between neurons. Subsequently, operation of the resistive matrix will be briefly described.
The model shown in FIG. 4 is often called a connectionists' model. In this model, each neuron unit (amplifying circuit) simply performs thresholding of an input signal (or output a signal corresponding to magnitude of the input signal with respect to a predetermined threshold value). Each resistive coupling element 1 couples output of one amplifying circuit to input of another amplifying circuit. Therefore, output state of each amplifying circuit Cj is determined by output states of all the remaining amplifying circuits Ci (i.noteq.j). When a certain amplifying circuit Cj detects current on the corresponding input line Aj, output of the amplifying circuit Cj at that time is given as: ##EQU4## In the expression above, Vin (i) and Vout (i) represent input and output voltages, respectively, of the amplifying circuit Ci connected to a data input line Ai, Ii represents current flowing through a single resistive coupling element 1, Wij represents conductance of a resistive coupling element which couples the amplifying circuit Ci connected to the data input line Ai to the amplifying circuit Cj connected to the data input line Aj.
The output voltage Vout of each amplifying circuit C is determined by transfer characteristics of the amplifying circuit C itself. The amplifying circuit C per se does not supply current to the data input line A but simply controls the switching elements S1 and S4 for their on/off operation. Accordingly, the output load of the amplifying circuit C is reduced to the capacitance of data output lines, ensuring fast operability. A voltage on an input line Ai corresponding to a certain amplifying circuit Ci is given by a total sum of currents flowing into the input line Ai. This voltage is adjusted such that the total current flowing in this network becomes 0. In such state, the total energy of the neural network reaches local minima.
Each of the amplifying circuits C is constituted of, for example, a CMOS inverter which has a high input impedance and input/output characteristics given by a non-linear monotonously increasing threshold function as described above. In this case, the following relational expression can be obtained from the above-described condition that the total current becomes 0. ##EQU5## In the expression above, Iij represents current flowing through the resistors of a resistive coupling element controlled by output of the amplifying circuit Ci connected to the input line Ai. .DELTA.Vij is a potential difference at the resistive coupling element and given by: ##EQU6## Rij represents resistance at the resistive coupling element and is given by R+ or R-. Therefore, the voltage Vin (j) is a total sum of all the contributions of the amplifying circuits connected to the data input line Aj.
The amplifying circuits C serve as threshold elements with high gains. The threshold value of an amplifying circuit C is often set to about 1/2 of sum of the supply potentials V.sub.SS and V.sub.DD.
The above-mentioned operation is analogical computation. This analogical computation is performed at a time in parallel in the resistive matrix 100. However, both input data signals and output data signals are digital data. Subsequently, a practical computing operation will be described with reference to FIG. 4.
Input data is applied to the respective input lines A1 to A4 through a register 10. The respective input lines A1 to A4 are charged to voltage levels corresponding to the input data and thus the neural network is initialized. Output potentials of the amplifying circuits C1 to C4 change according to charging potentials applied to the data input lines A1 to A4. These potential changes on the data output lines are fed back to the input lines A1 to A4 through the corresponding resistive coupling elements. The potential levels fed back to the data input lines A1 to A4 are defined by the programmed states of the respective resistive coupling elements 1. More specifically, when a resistive coupling element 1 has been programmed to be in the excitatory state, current flows from the supply potential V.sub.DD to a data input line Ai. On the other hand, when the resistive coupling element 1 has been programmed to be in the inhibitory state, current flows from the supply potential V.sub.SS to the data input line Ai. Such operations proceed in parallel except for those resistive coupling elements that have been set in the open state. Thus, currents flowing into the data input line Ai are analogically added together, causing a potential change on the data input line Ai. When the potential change on the data input line Ai goes beyond a threshold voltage of the corresponding amplifying circuit Ci, output potential of this amplifying circuit Ci changes.
By repeating such operation, output potential of each amplifying circuit C changes to meet the above-mentioned condition that the total sum of currents becomes 0, until the network settles in a state satisfying the above-described expression of the stable state. When this network has been stabilized, output voltages of the amplifying circuits C1 to C4 are stored in an output register and then read out.
A determination as to whether the network has been stabilized or not is made depending on whether or not a predetermined time has passed since the data input, or alternatively, it is determined that the network has been stabilized when, as a result of direct comparison between output data stored in the output register and different from each other in terms of time, a difference between the output data is smaller than a predetermined value.
As will be apparent from the description above, this neural network outputs such output data as allowing energy of the neural network to settle in minimum values (or local minima). Thus, according to the programmed states of the resistive coupling elements 1, the resistive matrix 100 stores some patterns or data and can determine match/mismatch between input data and the stored pattern or data. Therefore, such a neural network can also serve as an associative memory circuit or a pattern discriminator.
A structure obtained by removing the feedback paths between the data output lines and the data input lines in the resistive matrix 100 shown in FIG. 4 has been known as a perceptron circuit of a single layer. This perceptron circuit can operate in a simplified learning algorithm, and when multi-layered, it can configure a flexible system.
Further, it has been known that if the energy function in the Hopfield's model is regarded as a probability variable and the Hopfield's algorithm is expanded to a probability system, a Boltzmann's model (Boltzmann's machine) can be obtained. In FIG. 6, there is shown a structure of the major portion of a semiconductor neural network according to the Boltzmann's model. The structure shown in FIG. 6 is disclosed, for example, in "A Neuromorphic VLSI Learning System" pp. 213 to 237 in a Journal "Advanced Research in VLSI, 1987" published by MIT Press.
In FIG. 6, neuron units are constituted of differential amplifiers Zl to Zj each having two complementary outputs S and S. When a neuron is in the "on" state, the output S represents "1" (5V), and when the neuron is in the "off" state, the output S represents "0" (0V). Output of a neuron unit (differential amplifier) is fed back to differential inputs IN and IN through resistive elements R. The resistive elements R have modifiable conductances which define a weight Wij.
To apply a self-bias value -.theta. to the respective input lines IN and IN, there is provided a self-bias portion 400. This self-bias portion 400 constantly receives complementary data of "1" and "0" through a differential amplifier Zt. When corresponded to a vital neuron, each of the differential amplifiers Z1 to Zj arranged on the diagonal corresponds to a cellular body and performs threshold processing. The input lines IN and IN correspond to dendrite for receiving signals from other neurons. Each of the data input lines IN and IN can transmit both excitatory and inhibitory signals. The output lines S and S correspond to axon through which a signal from one neuron is transmitted to another. The resistive elements R correspond to synapse and their resistance values represent a coupling capacitance (synapse load) between neurons.
Resistive elements R arranged at connections of data input lines IN and IN and data output lines S and S, or at a location of i row and j column, (i, j), can couple outputs of a neuron (differential amplifier Zj to inputs of another neuron (differential amplifier) Zi and thus provides a positive weight Wij. In the case of this positive weight Wij, the output line Sj is connected to the input line INi and the complementary output line Sj is connected to the complementary data input line INi. In the case of a negative weight Wij, the complementary data output line Sj is connected to the data input line INi and the data input line Sj is connected to the complementary data input line INi.
Initialization of this neuron network is performed by setting the resistance values of the resistive elements R. A problem of the Boltzmann's model is to find out a weight Wkl (conductance of a resistive coupling element located at k row and l column) which allows the neural network to realize by itself a probability distribution of input/output data as correctly as possible without the same being externally applied. To set the weight Wkl of each resistive element, there is provided a weight processor (not shown) for each weight Wkl. This weight processor has functions of latching weight data, shifting the latched data to an adjacent latch, and after each operation loop (plus phase, minus phase and the like), incrementing or decrementing the latched data according to a predetermined relational equation.
The algorithm of the Boltzmann's model includes operation 1 (plus phase), operation 2 (minus phase), operation 3 (change of the weight Wil) and operation 0 (learning of output layer).
The operation 1 includes steps of (1) annealing, (2) collecting data, and (3) determining p.sup.+. The step of annealing is to externally apply an analog noise signal whose amplitude decreases as the operation proceeds, to the differential inputs of each differential amplifier. That is, by starting this step of annealing at a high temperature and then gradually reducing the temperature, a neural network system is put in a thermal equilibrium, or have a global energy settled in local minima. This state appears at each differential amplifier Z, which evaluates its own state and sets it in the "on" or "off". The data collecting step is to determine the number of states where both two coupled neurons (differential amplifiers) take "1". The mean value of collected data in each data collecting step is represented by P.sup.+.
In the operation 2 (minus phase), the above-described three steps of the operation 1 are executed with only the states of those neuron (differential amplifiers) receiving input data being fixed at "1". In this operation 2, a value obtained in the step of finding a mean value is assumed to be P.sup.-.
The operation 3 is to change the weight Wkl according to the mean values P+ and P.sup.- obtained in the operations 1 and 2.
After the operations 1 and 2, the respective weights Wkl have been adjusted in parallel operation. The weight processors provided for the respective weights evaluate their states to increment or decrement the corresponding weights. As previously described, since the data input/output lines are arranged to form pairs, the weights are adjusted using the above-mentioned parallel algorithm.
In FIG. 7, there is shown an example of specific elements of a resistive element providing the weight Wkl. In FIG. 7, a weight portion comprises four transistor groups TR1, TR2, TR3 and TR4 for providing a positive or negative coupling. The transistor groups TR1 to TR4 are configured in the same manner and each comprises n MOS transistors T0 to Tn-i and a pass-transistor TG.
The resistance ratios (width/length ratio of a transistor) of the MOS transistors T0 to Tn-1 are set to 1:2: . . . : 2.sup.n -1. The pass-transistors TG are responsive to either of sign bits T.sub.SGN and T.sub.SGN indicative of positive and negative couplings for connecting data input lines to corresponding data output lines. In this case, since transistor groups provided on a diagonal simultaneously connect the data input and output lines, the pass-transistors TG1 and TG4 receive the positive sign bit T.sub.SGN at their gates and the pass-transistors TG2 and TG3 receive the negative sign bit T.sub.SGN. The weight Wij provided by the resistive elements R can be set as desired by putting an appropriate combination of the transistors T0 to Tn-1 in each transistor group in the on state.
Such semiconductor neural networks according to the Hopfield's model and the Boltzmann's model, which have employed various types of structure to express the weight corresponding to synapse load, have the following problems.
When coupling elements, each configured of a basic cell having simple structure as shown in FIGS. 4 and 5, are provided at connections between data input lines and data output lines, each of the coupling elements can provide only three non-weighted states simply represented by "1", "0" and "-1", or correspondingly "excitatory state", "don't care state" and "inhibitory state". Therefore, the synapse coupling model is oversimplified so that in a practical circuit operation, convergence of the neural network to the energy of local minima is deteriorated.
To improve the convergence of the neural network, it is required to give multi-level expression to the coupling state (weight) of a coupling element. It has turned out through circuit simulations that in order to obtain a convergence generally fit for practical use, at least 10-bit (1024 steps) indication of the coupling state is required.
The multi-level expression of the coupling state can be implemented, for example, by the coupling element structure shown in FIG. 7. In the coupling element structure shown in FIG. 7, however, transistors of different conductances are required to constitute a single basic coupling element. Those different conductances can be obtained by adjusting size (ratio of gate width and length of a transistor, or the like) of those transistors. Therefore, it is required to provide a number of transistors of different sizes in a coupling element region. If the coupling element region is limited in area, however, the size of the transistors is inevitably reduced and thus size differences between the transistors are also reduced. In this case, the size error or size error tolerance introduced inevitably in manufacturing the circuit has larger influences on the size differences between the transistors, so that a desired conductance ratio can not be given among the transistors. As a result, multi-valued weighting can not be precisely applied to each synapse coupling strength.
similarly, when a number of neuron units are formed on a single semiconductor chip, the number of coupling elements is inevitably increased so that also the area occupied by a single coupling element formed on the limited semiconductor chip is reduced, bringing about the same problems as described above.
To obtain such coupling elements as can realize sufficient convergence, even if the number of transistors has been reduced by expressing weights using combinations of the transistors, a large number of transistors with well-controlled size accuracy are required. This has been an obstacle in reducing the occupied area of a coupling element and prevented formation of a high-density integrated neural network circuit on a limited semiconductor chip.
Further, instead of expressing a single weight using a plurality of transistors, a method of expressing multi-valued (more correctly, analogical) weight by using the charge amount accumulated at the floating gate of one non-volatile transistor has been proposed. When this floating gate-type transistor is used, however, since charge retention characteristics of the floating gate and correspondence between the accumulated charge amount and weighting factors still contain uncertainty, the weights (synapse loads) may possibly change in circuit operation, and the synapse coupling strengths may not obtain desired weights.
In this case, if the correspondence between the weighting factors and the accumulated charge amount, which is determined in the learning of the neural network, remains uncertain, it will bring about poor convergence in the learning, resulting in a longer learning time of the neural network.