The present invention realtes to a system for rapidly performing the learning of a multi-layer neural network which is capable of being applied to image pattern recognition such as character recognition, signal processing in a sonar, and stock or financial business, and a parallel calculation method or learning method for such a multi-layer neural network.
If the learning of a multi-layer neural network is performed on a general purpose computer which is composed of a single processor, calculation over a long time is required since a great number of times of calculation are generally involved. To begin with, the neural network is easy to implement on a parallel computer and the improvement in speed of parallel processing can be expected by a proper learning method.
Accordingly, approaches for performing high-speed calculation by parallel processing have been made and one of them is disclosed by Technical & Research Report of the Institute of Electronics, Information and Communication Engineers of Japan, ME and Biocybernetics 88-134 (1989). In the disclosed approach, a plurality of processors are arranged in a ring form. The processor includes a digital signal processor (DSP) and a local memory. One processor is allotted with one node or more of a layer adjacent to that processor. In a local memory of a processor allotted with a certain node are stored the factors of multiplication (or weights) for a preceding layer to which that node belongs. When the calculation of sum of products is performed forward from an input or initial layer toward an output or final layer, each of processors belonging to a certain layer independently operates to place a product of the value of a node allotted to that processor and a weight for a preceding layer on the ring and the product is rotated on the ring until a desired processor is reached. In this case, a layer near to the input layer becomes a successive layer and a layer near to the output layer becomes a preceding layer. On the other hand, when the calculation of sum of products is performed backward from the output layer toward the input layer, it is not required to place any weight on the ring since no weight necessary for performing the calculation of sum of products is stored in each processor. But, it is necessary to transmit error data from the output layer to a successive layer. Therefore, the error data is rotated on the ring until a desired processor is reached. Namely, in order to transfer data to a node which is at a distance of N nodes, it is required that the data is rotated N times on the ring. Accordingly, if the number of processors involved increases, a time required for data transfer is increased. In order to eliminate an overhead associated with the data transfer, it is effective to change the connection configuration of processors from the ring form to a bus form. In the bus type connection, data can be transferred with the same time lag to a node which is at any distance. The bus type connection is described in detail by JP-A-2-181257 (corresponding to U.S. Ser. No. 07/461,080 filed Jan. 4, 1990 assigned to the same assignee of the present application).
In the bus type connection, data place on a bus from a certain processor can be received by a plurality of processors simultaneously and hence the calculation of sum of products can be performed in parallel. Now assume that there is employed a structure in which each processor holds weights for a successive layer. In this case, when the calculation of sum of products is performed in a forward direction, processors of a successive layer successively place their outputs on the bus and processors of a preceding layer can calculate the sum of products from weights corresponding to respective connections. On the other hand, when the calculation of sum of products is executed in a backward direction, weights necessary for performing the calculation of sum of products must be transferred from a preceding layer to a successive layer since such weights are not stored in processors of the preceding layer which perform the calculation of sum of products. However, since the preceding layer and the successive layer are connected by the bus, it is not possible to transfer data in parallel. Therefore, there is a demerit that the improvement in execution speed proportional to the number of processors cannot be expected.
In the above-mentioned prior art, the reduction of the amount of data to be processed upon transfer of data between processors is not taken into consideration and hence there is a problem that a high speed cannot be achieved even if a plurality of processors are connected.