The present invention relates to a neural network and method for training the neural network. An information processor called hierarchical neural network, which attempts to mimic neurobiological systems, includes plural synapses of parallel-distributed-processing elements called neuron units arranged in layers, an input layer A1, an output layer A3, and one or more "hidden" layers A2 (A2-1 and A2-2) in between, as shown in FIGS. 1 and 2. Each unit's output goes to either other units in subsequent layers or outside the network, an output signal Y.sub.j being defined by the unit's weight and threshold as follows with reference to FIG. 3: ##EQU1## where W.sub.ji is a weight (incidentally, in the above equation (1) and hereinafter, a threshold is treated as another weight), X.sub.i is an input signal, and f(x) is a mapping function (so-called sigmoid logistic function) shown in FIG. 4 and defined as follows; EQU f(x)=1/(1+exp(-x)) (2)
When the weight has been set to a correct level, a complex stimulus pattern (input signal) at the input layer successively propagates via hidden layers to result in a simpler output pattern (output signal). The network can be taught by feeding it an input signal and corresponding expected output signal, referred to as teacher signal; the network can learn by measuring the difference at each output unit between the expected output signal and the output signal that is actually produced. All or appropriate weights are gradually modified by iterating a learning algorithm to provide an output signal which more closely approximates the expected output signal. Hereupon, a backpropagation algorithm, disclosed in Learning Internal Representations by Error Propagation, Chapter 8 of Parallel Distributed Processing, by Rumelhart et al., MIT Press Cambridge, Mass., pp. 318-362, 1986, and has been well-known as such a learning algorithm, uses a gradient search technique to minimize an error function E equal to the mean square difference between the expected output signal d.sub.j and the actual output signal Y.sub.j. The error function E is defined in a neural network with "M" outputs and "K" pairs of input and expected output signals, as follows; ##EQU2##
Each weight W.sub.ji is modified by adding .DELTA.W.sub.ji .alpha.-.differential.E/.differential.W.sub.ji thereto to minimize the error function E, hereupon, ##EQU3## where .eta. is called a learning ratio (generally less than 0.5 ).
In addition, ##EQU4##
In the neuron unit j at an output layer; EQU .delta..sub.j =(d.sub.j -Y.sub.j).f'(net.sub.j) (9(.BECAUSE.(3))
In the neuron unit j at a hidden layer; ##EQU5##
Thus, .delta..sub.j regarding the neuron unit j can be expressed by .delta..sub.k of each of neuron units k which receives a signal from the neuron unit j.
Thus, a weight may be modified by the following procedures:
1. CALCULATE ERROR OF NEURON UNIT AT OUTPUT LAYER (.BECAUSE.(9))
2. CALCULATE ERROR OF NEURON UNIT AT LAYER JUST PREVIOUS TO THE OUTPUT LAYER (.BECAUSE.(10) and the above result)
3. CALCULATE SEQUENTIALLY, LAYER BY LAYER, ERROR OF NEURON UNITS LOCATED BETWEEN THE OUTPUT AND INPUT LAYERS (.BECAUSE.(10) and the above result)
4. CALCULATE .DELTA.W.sub.ji (.BECAUSE.(7))
Various methodologies to construct the neural network as hardwares have been proposed; the easiest methodology being to produce it by a software. This software implements various types of neural networks but results in a low operational speed and a bulk-sized computer, so that the method is useful only for the research of the neural network. Accordingly, as another methodology, a neurocomputer architecture built on a chip, as an analog circuit or a digital circuit, has been proposed. The analog neural network expresses each weight as a variable resistor and each unit as an operational amplifier, however the analog network has disadvantages in an instable characteristic depending on temperature, every chip of discrete characteristic, and a low degree of noise tolerance. On the other hand, the digital neural network can overcome the above disadvantages however have a disadvantage in a complicated circuitry construction. Accordingly, the applicant has disclosed a new digital network having a relatively simple circuitry construction, referred to as a pulse-density neural network hereinafter, in Japanese Laid-Open Patent Applications No. 1-179629 and No. 1-343891, and 1990 SPRING NATIONAL CONVENTION RECORD, THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS D-56, and THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS REPORT IDC90-129. In addition, the applicant has also proposed a new neutral network squeezed on an LSI chip in 1990 SPRING NATIONAL CONVENTION RECORD, THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS D-41. Other various types of pulse-density neural networks are disclosed in Japanese Laid-Open Patent Application No. 1-244567, U.S. Pat. No. 4,893,255, and THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS REPORT MBE87-157, pp. 415-422, March, 1988.
A brief description will now be given with reference to FIGS. 5 to 11 of the operation of the pulse-density neural network disclosed by this applicant. As shown in FIG. 5, an input signal 4601, an output signal 4602 and a weight 4603 respectively are expressed as a synchronous bit string comprising an OFF-bit "0" and an ON-bit "1" and its bit density (pulse-density) is expressed as a ratio of the ON-bits for all the bits. As shown in FIGS. 5 to 7, if an input signal (101101) shown in FIG. 6, whose pulse-density is thus 4/6, is input to an AND circuit 4604 with a weight (101010) shown in FIG. 7, whose pulse-density is thus 3/6, the AND circuit 4604 outputs a bit string (101000) to an OR circuit 4605 with multiplex input terminals which functions as the above function f(x). This pulse-density neural network can manage an excitatory synapse whose weight is non-negative, and an inhibitory synapse whose weight is negative in accordance with one of the following three methodologies:
1. Classifying each synapse into either the excitatory synapse 4909 or inhibitory synapse group 4910, as shown in FIG. 8; Logical operations being individually performed for each group. Instead of the above, each input signal may be classified into the excitatory input group or the inhibitory input group. An output signal of an OR circuit 4906 in the inhibitory synapse group 4910 is inverted by an inverter 4907, and an AND circuit 4908 performs the logical multiplication for the output of an OR circuit 4905 and that of the inverter 4907 to produce an output signal 4902.
2. Providing memories 5011 for storing information for classifying each synapse into an excitatory synapse or an inhibitory synapse, and set circuits 5012 connected to the corresponding memory 5011, for classifying each weight in accordance with the information stored therein, as shown in FIG. 9; i.e., "0" stored in the memory 5011 indicates the excitatory synapse whereas "1" stored therein indicates the inhibitory synapse. OR circuits 5005 and 5006, an inverter 5007, and an AND circuit 5008 respectively correspond to the OR circuits 4905 and 4906, the inverter 4907, and the AND circuit 4908, and a duplicate description thereof will be omitted.
3. Expressing each weight into a subtraction of a negative weight (component) W.sub.ji- from a positive weight (component) W.sub.ji+ (W.sub.ji+ -W.sub.ji- W.sub.ji+ .gtoreq.0 and W.sub.ji- .gtoreq.0), and storing the positive weight into a memory 5103 and the negative weight into a memory 5104, as shown in FIG. 10. An AND circuit 5105 performs the logical multiplication for an input signal 5101 and the positive weight whereas an AND circuit 5106 performs the logical multiplication for an input signal 5101 and a negative weight. OR circuits 5107 and 5108, an inverter 5109, and an AND circuit 5110 respectively correspond to the OR circuits 4905 and 4906, the inverter 4907, and the AND circuit 4908, and a duplicate description thereof will be omitted.
The above three methodologies have characteristics in that, when the pulse-density "0" of the input signal is supplied, that of the output signal becomes accordingly "0". In addition, the input signal from the inhibitory synapse group influences more strongly than that from the excitatory signal. However, the above three methodologies have a disadvantage in that their mapping functions are approximately expressed as a function shown in FIG. 11 similar to the sigmoid function shown in FIG. 4 but whose longitudinal axis is biased toward the negative direction. Accordingly, the applicant has improved the above three methodologies, as disclosed in Japanese Laid-Open Patent Application No. 2-316505, to approximate the sigmoid function shown in FIG. 4. Briefly speaking, as shown in FIGS. 12 to 14 respectively corresponding to FIGS. 8 to 10, the improved pulse-density neural network uses a logic operational circuit (5214, 5314, or 5414) and a predetermined input signal (5213, 5313, or 5413) generated by an external pulse generator. Since the bit density of the predetermined input signal can be freely adjusted, the output signal will preferably approximates to the sigmoid function if the bit density thereof is set to 0.5.
However, each of the above pulse-density neural networks has the following disadvantage: The backpropagation algorithm using the equation (1) can be applied to a digital neural network which is not the pulse-density neural network and manages both excitatory and inhibitory synapses, since "net.sub.j " can freely choose a positive or negative value and thus express two variables. But, contrastingly, the backpropagation algorithm using the equation (1) cannot be applied to each of the above pulse-density neural networks which manages both the excitatory and inhibitory synapses since the pulse-density can only take a positive value and thus express only one variable. Accordingly, it is necessary for each pulse-density neural network compatible with both the excitatory and inhibitory synapses to individually process two variables so as to generate an output signal defined by the following equation (11). EQU Y.sub.j =f{X.sub.i+, X.sub.i- } (11)
Incidentally, U. Hirai has disclosed a new pulse-density neural network in NIKKEI MICRODEVICES JULY, 1988, pp. 72-75, and Japanese Laid-Open Patent Application No. 1-244567, in which an updown counter is used to calculate "net.sub.j " for two variables. However, Hirai's neural network has a disadvantage in that the updown counter complicates the circuitry construction of the neural network.