The present invention relates to an information processing apparatus including a neural network which can learn at high speed various processing such as speech recognition, image processing, control,, etc. More particularly the present invention relates to an information processing apparatus which is conveniently constructed on a wafer-scale integrated circuit (WSI) and can provide a compact-sized low-cost large-scale neural network with high speed learning function.
Generally, a neural network is applied perform an information processing such as recognition, and knowledge processing. The neural network, when an input and a desired output are given, can be self-organizing through learning. Therefore, it does not require any program, and so several kinds of applications are possible.
One of the previously known algorithms used in the learning is a technique of "back propagation" disclosed in Parallel Distributed Processing, Chapter 8, pages 318-362 by Rumelhart et al. The "back propagation" will be explained below.
FIG. 2 shows a model j of a neuron. A plurality of neurons constitute a neural network. The neuron shown in FIG. 2 is input with inputs x.sub.i 's from other neurons. The respective inputs are weighted with synapse weights w.sub.ji, and the internal energy (inner product) u.sub.i which is a sum of the weighted inputs is computed. This value u.sub.i is converted by e.g. a sigmoid function f to provide an output x.sub.j. It is assumed that the synapse weight w.sub.ji is a weight to be charged to the input neuron i in the neuron j. ##EQU1##
Coupling a plurality of neurons provides a neural network, and also changing the synapse weight w.sub.ji permits several information processings to be executed.
FIG. 3 shows a hierarchical neural network. Although the neural network shown is composed of three layers of an input layer, a hidden layer and an output layer, the hidden layer may be composed of plural layers. The neural network as shown in FIG. 3 has two operation modes. One is a forward mode for information-processing an input to provide an output. The other is a backward mode for externally supplying an expected output value to a given input to modify the synapse weight w.sub.ji from the output layer toward the input layer; then the modification value .DELTA.w.sub.ji for the synapse weight w.sub.ji is assumed for the neuron j as EQU .DELTA.w.sub.ji.sup.n+1 =.eta..delta..sub.j x.sub.i +.alpha..DELTA.w.sub.ji.sup.n ( 3)
where n is the number of learnings, and .alpha. and .eta. are contents. The function .delta. is determined as follows.
Assuming that an expected value is t.sub.k, in the neuron k in the output layer, EQU .delta..sub.k =(t.sub.k -x.sub.k)f'(u.sub.k) (4),
and in the neuron j in the hidden layer, ##EQU2## where the function f' is a derived function of the function f.
In this way, the modification value .DELTA.w is determined to modify the synapse weight w. This learning is repeated until the output from the output layer reaches the desired value.
Generally, information concerning the characteristics of a neural network and the research of its application have been carried out through simulation by software on a sequential computer. This technique requires very long computation time in order to execute the learning of a neural network composed of a huge number (e.g. several thousands or several tens thousands of) neurons. In many cases, the learning requires a very large number of repetitions. Thus, the capability of a large scale neural network has not yet been sufficiently uncovered.
On the other hand, several attempts have been made to compute the neural network using dedicated hardware at high speed. One of them is a system disclosed in "A wafer Scale Integration Neural Network Utilizing Complete Digital Circuits" IJCNN '89 Proceedings (Vol. II pp. 213-217), as shown in FIG. 4.
In FIG. 4, 201's are neuron computation units, and 210 is a bus, the neuron computation units are connected with each other through the bus 210. In operation, one of the neuron computation units 201 is selected and its output values are output to the bus 211. Each of the neuron computation units 201, which holds the synapse weights for the output neuron in its memory, weights the values sequentially input through the bus 211 with the synapse weights read from the memory, and cumulatively adds the products thus formed. The neuron computation unit 201 outputs the its value, after converted into a sigmoid function of Equation (1) to the bus 211. A weighting circuit is operated in a time-divisional manner. When each of the neuron computation units 201 completes output of its values, all of the neuron computation units 201 can compute Equation (2). Hereinafter, such a computation technique is referred to as a time-divisional bus connection system.
The time-divisional computation system, due to its operation, permits the operation of a neuron expressed in Equations (1) and (2) to be computed at high speed. Another feature of the time-divisional computation system is that the system shown in FIG. 4 can be realized on the wafer scale integrated circuit (WSI). By using the weighting circuit in a time-divisional manner, the time divisional bus connection system can greatly reduce the area occupied by one neuron computation unit. Thus, the area occupied by one computation unit is made as small as possible to enhance the production yield for one computation unit; the percentage of the computation units which cannot operate owing to a defect is decreased so that the system shown in FIG. 4 can be implemented. The system implemented in WSI permits a large scale neural network to be executed in a compact size and at high speed as compared to the prior art.