1. Field of the Invention
The present invention relates to a neural network arithmetic apparatus and a neural network operation method, and more particularly to a neural network arithmetic apparatus and neural network operation method that perform neuron operations in parallel by plural arithmetic units.
2. Description of the Prior Art
A neural network built by imitating information processing in a brain-based nervous system finds application in information processing such as recognition and knowledge processing. Such a neural network is generally configured by connecting a large number of neurons, which transmit their output signals to each other.
An individual neuron j first calculates the sum of a neuron output value Yi from another neuron i, which is weighed by a synapse connection weight Wji. Then the a neuron output value Yj is generated by converting the summation by a sigmoid function f. The operation is represented as shown by an equation (1) below, where i and j are any integer.                               Y          j                =                  f          ⁡                      (                                          ∑                i                            ⁢                              xe2x80x83                            ⁢                                                W                  ji                                ·                                  Y                  i                                                      )                                              (        1        )            
This operation is called a neuron operation. In a learning process by back propagation generally used, for a given input, an expected output value dj (that is, a teacher signal) is afforded from the outside and synapse connection weights wji are updated so that an error xcex4j(=djxe2x88x92Yj) from an actual output value becomes small. The update amount is calculated by an equation (2) below.
xe2x80x83xcex94Wji=xcex7xc2x7xcex4jxc2x7Yixe2x80x83xe2x80x83(2)
xcex7 is a learning coefficient and xcex4j is a learning error. In an output layer, operations are performed using an equation (3) below.
xcex4j=(djxe2x88x92Yj)xc2x7ƒxe2x80x2(uj)xe2x80x83xe2x80x83(3)
In a hidden layer, operations are performed using an equation (4) below.                               δ          j                =                              (                                          ∑                k                            ⁢                              xe2x80x83                            ⁢                                                W                  kj                                ⁢                                  δ                  k                                                      )                    ·                                    f              xe2x80x2                        ⁡                          (                              u                j                            )                                                          (        4        )            
To perform these operations in a large-scale neural network having thousands to tens of thousands of neurons, an enormous amount of operation is required, requiring dedicated hardware.
As a prior art, the following information processing system is proposed in Japanese Published Unexamined Patent Application No. Hei 5-197707. In this system, as shown in FIG. 29, plural arithmetic units 601 to 60x having synapse connection weights 621 to 62x (x is an integer) respectively are coupled in parallel by a time-shared bus 64 connected to a controller 66.
In the information processing system, the arithmetic units 601 to 60x are responsible for processing specific neurons and one arithmetic unit (a second arithmetic unit 602 in FIG. 29) selected by the controller 66 outputs a neuron output value to the time-shared bus 64.
The arithmetic units 601 to 60x which hold synapse connection weights between outputting arithmetic unit (the second arithmetic unit 602 in FIG. 29) and their own in their memory, accumulates a value inputted from the time-shared bus 64 weighted by the corresponding synapse connection weight in their memory.
An arithmetic unit (the second arithmetic unit 602 in FIG. 29) selected by the controller 66 converts a value resulting from the accumulative additions by, e.g., a sigmoid function f (the above equation (1)) and outputs the result to the time-shared bus 64. Output from all the arithmetic units 601 to 60x to the time-shared bus 64 means that all the arithmetic units 601 to 60x have performed the equation (1).
The invention disclosed in Japanese Published Unexamined Patent Application No. Hei 5-197707 constitutes a large-scale neural network by a parallel operation algorithm formed as described above.
However, since the prior art system has a large number of arithmetic units connected to the time-shared bus, a clock of the time-shared bus cannot be increased, which means neuron output values cannot be rapidly supplied to the arithmetic units. That is, the inability to speed up a bus transfer clock causes a bottleneck in the speed of transmitting neuron output values, posing the problem that a remarkable increase in processing speed is not achieved.
Since data is simultaneously supplied to all the arithmetic units, unnecessary data is also received. These facts cause the arithmetic units to be limited in data supply rate, posing the problem that operations cannot be performed rapidly.
To solve the above problems, it is conceivable to provide all necessary neuron output values as well as synapse connection weights in the memory of the arithmetic units. However, a limited capacity of the memory makes it impossible to store all neuron output values in the event that the scale of the neural network becomes larger. The other approach to solve the problem is to hold all the neuron output values distributively in plural arithmetic units. Also in this case, there is the problem that transmission speed of neuron output values causes a bottleneck, because a neuron arithmetic unit needs neuron output values stored in memories within other arithmetic units to perform neuron operations.
The present invention has been made in view of the above circumstances and provides a neural network arithmetic apparatus and a neural network operation method that, when a neural network is computed in parallel using a large number of arithmetic units, enable the arithmetic units to operate independently and rapidly, and do not cause reduction in processing speed by the number of arithmetic units increased to meet the scale of a network.
To solve the above circumstances, a neural network arithmetic apparatus according to an aspect of the present invention performs neuron operations in parallel by plural arithmetic elements, connected over at least one transmission line, to each of which a predetermined number of neurons of plural neurons making up a neural network are assigned. In the apparatus each of the plural arithmetic elements includes: a synapse connection weight storage memory that stores synapse connection weights of at least part of all synapses of one neuron for a predetermined number of assigned neurons; and an accumulating part that, during a neuron operation, successively selects the predetermined number of neurons and successively selects synapses of the selected neuron, multiplies the synapse connection weight of the selected synapse by the neuron output value of a neuron of a preceding stage connected with the synapse, accumulates the result for an identical neuron, and outputs an obtained value as a partial sum of neuron operation value. Each of the plural arithmetic elements further includes a neuron output value generating part that generates a neuron output value by accumulating partial sums of neuron operation values outputted by the plural arithmetic elements until the values of all synapses of one neuron are added.
That is, since each of plural arithmetic elements, connected over at least one transmission line, to each of which a predetermined number of neurons of plural neurons making up a neural network are assigned, has a synapse connection weight storage memory that stores synapse connection weights of at least part of all synapses of one neuron, and an accumulating part, neuron operations on a predetermined number of assigned neurons can be performed independently in units of operation elements.
Each arithmetic element can be utilized to calculate not only a partial sum of neuron operation value but also a partial sum of error signal operations.
Therefore, unlike a conventional approach, arithmetic elements for neuron operations and arithmetic elements for error signal operations need not be provided separately, and operations of a neural network can be performed using fewer arithmetic elements than have been conventionally required. Consequently, a neural network arithmetic apparatus is obtained which can perform operations of a large-scale neural network without decreasing operation speed by using almost the same number or fewer arithmetic elements that have conventionally been used.
Since operations are performed using synapse connection weights and neuron output values held by each of plural arithmetic elements, each of plural arithmetic elements outputs only a partial sum to the bus, whose data rate is less than conventionally. Consequently, operations of a large-scale neural network can be performed without reduction in operation speed due to an insufficient transmission line band.
The neural network arithmetic apparatus according to another aspect of the present invention further includes: an intermediate partial sum accumulating part that accumulates, for an identical neuron, at least one of the partial sum of neuron operation value and a partial sum obtained by accumulating the partial sum of neuron operation value for an identical neuron, and outputs the result as an intermediate partial sum of neuron operation value. The neuron output value generating part accumulates at least one of the partial sum of neuron operation value and the intermediate partial sum until the values of all synapses of one neuron are added.
That is, partial sums of neuron operation values are accumulated in multilayer structure in a way that accumulates the plural partial sums of neuron operation values to generate an intermediate partial sum of neuron operation value, accumulates the partial sum of neuron operation value to obtain a partial sum (that is, an intermediate partial sum) of neuron operation values and further accumulates the result to generate an intermediate partial sum of neuron operation value, or adds at least one of the partial sums of neuron operation values and at least one of the intermediate partial sums of neuron operation values to generate an intermediate partial sum of neuron operation value, whereby the present invention is applicable to a large-scale neural network made up of an enormous number of neurons without causing shortage of a transmission line band.
Preferably, according to another aspect of the present invention, the plural arithmetic elements are split into plural groups each containing a predetermined number of arithmetic elements and the neuron output value generating part is provided in each of the groups. The amount of information transferred to and from the outside of the arithmetic elements is decreased and operations of a large-scale neural network can be performed without decreasing operation speed.
As such a neural network arithmetic apparatus, according to another aspect of the present invention, a configuration is possible in which a predetermined number of arithmetic elements making up one of the plural groups are split and formed on plural semiconductor elements, the plural semiconductor elements on which the predetermined number of arithmetic elements are formed are mounted on an identical circuit substrate, and the intermediate partial sum accumulating part is provided for each of the semiconductor elements on which the predetermined number of arithmetic elements are split and formed. According to another aspect of the present invention, a configuration is also possible in which a predetermined number of arithmetic elements making up one of the plural groups are split and formed on plural semiconductor elements on plural circuit substrates, the plural circuit substrates are mounted on an identical mounting substrate, and the intermediate partial sum accumulating part is provided at least for each of the semiconductor elements or on the circuit substrates.
Another aspect of the present invention is the neural network arithmetic apparatus, in which the arithmetic elements further include a data storage memory in which neuron output values at least related with the arithmetic elements are stored. This further reduces the amount of data exchanged, thereby contributing to reduction in the amount of use of transmission lines and enabling application to operations of a larger-scale neural network.
Furthermore, according to another aspect of the present invention, the data storage memory has at least two memories so that one memory stores data to be used for operations and another stores results obtained by the operations, whereby necessary data has been held in the arithmetic elements in a stage where operations on the next arithmetic layer are started, eliminating the need to re-supply data to each arithmetic element and enabling quicker initiation to the next operation processing.
Another aspect of the present invention is the neural network arithmetic apparatus, further including a connection weight updating part that updates each of synapse connection weights of a selected neuron stored in the storage memory.
Thereby, since the arithmetic elements can perform neuron operations and synapse connection weight updating on respectively independently assigned neurons, unlike a conventional approach, arithmetic elements for neuron operations and arithmetic elements for updating synapse connection weights need not be provided separately, and operations of a neural network can be performed using fewer arithmetic elements than have been conventionally required. Consequently, a neural network arithmetic apparatus is obtained which can perform operations of a large-scale neural network without decreasing operation speed by using almost the same number or fewer arithmetic elements that have conventionally been used.
During operations on error signals by back propagation, synapse connection weights are updated using error signals propagated backward. In the present invention, however, since plural neurons making up one operation layer are split and assigned equally to a group of a predetermined number of arithmetic elements, a value outputted from the predetermined number of arithmetic elements is a partial sum of error signal.
Therefore, another aspect of the present invention is the neural network arithmetic apparatus, in which the accumulating part further includes an error signal generating part that, during operations on error signals, successively selects the predetermined number of synapses receiving output signals from a specific neuron, multiplies the connection weight of a selected synapse by the error signal of a neuron having the selected synapse, accumulates the result for the predetermined number of synapses, outputs an obtained value as a partial sum of error signal, accumulates the obtained partial sum of error signal for all synapses connected with the specific neuron, and outputs an obtained value, as the error signal of the specific neuron, to an arithmetic element to which the specific neuron is assigned.
In this case, the connection weight updating part updates synapse connection weights stored in the storage memory, using an error signal generated by the error signal generating part.
Thereby, synapse connection weight updating on neurons assigned to each arithmetic element can be performed using error signals obtained in the arithmetic element.
A neural network arithmetic apparatus according to another aspect of the present invention further includes: plural first transmission lines that connect a predetermined number of arithmetic elements making up one group; and at least one second transmission line that is smaller in band width than the first transmission lines and connects plural groups. In the apparatus, fewer number of plural groups than the predetermined number of arithmetic elements making up a group are provided.
Another aspect according to the present invention is a neural network operation method that is suitable for the neural network arithmetic apparatus and performs neuron operations in parallel for a predetermined number of neurons, of plural neurons making up the neural network. The method includes the steps of: storing the synapse connection weights of at least part of all synapses of one neuron for a predetermined number of assigned neurons; successively selecting the predetermined number of neurons during neuron operations; successively selecting synapses of the selected neuron; multiplying the synapse connection weight of the selected synapse by the neuron output value of a neuron of a preceding stage connected with the synapse; accumulating the result for an identical neuron to generate a partial sum of neuron operation value; and accumulating the partial sum of neuron operation value until the values of all synapses of one neuron are added, to generate a neuron output value.
Another aspect of the present invention is the neural network operation method including the steps of: when accumulating the partial sum of neuron operation value to generate a neuron output value, accumulating, for an identical neuron, at least one of the partial sum of neuron operation value and a partial sum obtained by accumulating the partial sum of neuron operation value for an identical neuron to generate an intermediate partial sum of neuron operation value; and accumulating at least one of the partial sum of neuron operation value and the intermediate partial sum of neuron operation value until the values of all synapses of one neuron are added, to generate a neuron output value.
Another aspect of the present invention is the neural network operation method including the steps of: splitting the number of synapses to operate on, of neuron operations of an identical neuron into plural groups; and generating the intermediate partial sum of neuron operation value for each of the groups.
When splitting the number of synapses to plural groups, by equally splitting the number of the synapses to operate on, according to another aspect of the present invention, a time lag caused by the difference of time required for individual operations can be prevented to improve efficiency.
As described above, during learning by back propagation, a value outputted from a predetermined number of arithmetic elements is a partial sum of error signal. Therefore, another aspect of the present invention is the neural network operation method including the steps of: during error signal operations, successively selecting synapses receiving output signals from a specific neuron; multiplying the connection weight of a selected synapse by the error signal of a neuron having the selected synapse; accumulating the result for the predetermined number of neurons; outputting an obtained value as a partial sum of error signal; accumulating the partial sum of error signal for all synapses receiving output signals from the specific neuron; and outputting an obtained value as an error signal of the specific neuron to an arithmetic element to which the specific neuron is assigned. In this case, preferably, synapse connection weights are updated using an error signal obtained by accumulating the partial sum of error signal.