1. Field of the Invention
The present invention relates to artificial neural networks and, more specifically, to an optical processing unit that can be used to emulate such networks.
2. Description of the Related Art
An artificial neural network (ANN) is an information processing paradigm that is inspired by the way biological nerve systems, such as the brain, process information. An information processing system based on this paradigm can be composed of a large number of interconnected processing elements (artificial neurons) working together to solve specific problems. Similar to people, ANNs can learn by example. As such, an ANN can be configured for a specific application, e.g., pattern recognition or data classification, through a learning process, which involves appropriate adjustments of synaptic connections between different neurons and/or decision making in the neuron core.
FIG. 1 schematically shows a simple artificial neuron (SAN) 100 of the prior art, two or more instances of which can be arranged to form an ANN having one or more layers, each layers having one or more neurons. SAN 100 has a neuron core 102, which receives multiple inputs labeled MODE, REF, and X1-Xn and generates one output. SAN 100 has two modes of operation, the training mode and the processing mode, controlled by a signal received via input MODE. In the training mode, for each pattern in a plurality of training patterns received at inputs X1-Xn, neuron core 102 is trained to fire (or not), i.e., to generate a binary “1” (or “0”) at the output, by memorizing the corresponding binary values received at input REF. In the processing mode, when a taught input pattern is detected by neuron core 102 at inputs X1-Xn, the corresponding binary value memorized by the neuron core during the training mode appears at the output. However, if the input pattern received by neuron core 102 at inputs X1-Xn is not found among the training patterns, the neuron core uses a “firing rule” to determine whether to fire or not.
One firing rule that can be used in neuron core 102 is based on Hamming distance. More specifically, first, neuron core 102 sorts the training patterns into two sets, the first set having the training patterns, for which the neuron core has been instructed to fire, and the second set having the training patterns, for which the neuron core has been instructed not to fire. Next, for an input pattern not found in either set, neuron core 102 finds one training pattern from each of the two sets, for which training pattern the input pattern has most input values (X1-Xn) in common. In mathematical terms, neuron core 102 finds the shortest Hamming distance from the input pattern to each set. Then, neuron core 102 compares these two Hamming distances to determine which one of the two is shorter. Finally, neuron core 102 outputs a binary value equal to that of the set having the shorter of the two Hamming distances or, alternatively, remains undecided (generates an error) in the case of two equal Hamming distances. Thus, by applying the firing rule, neuron core 102 is able to generalize the training patterns, which enables SAN 100 to respond “sensibly” to all input patterns received in the processing mode, rather than responding only to the input patterns previously seen in the training mode.
FIG. 2 schematically shows a McCulloch-Pitts artificial neuron (MCPAN) 200 of the prior art. MCPAN 200 is generally similar to SAN 100. However, one difference between SAN 100 and MCPAN 200 is that, in the latter, the input signals received at inputs X1-Xn are weighted in weighting blocks 204-1 . . . 204-n before they are applied to a neuron core 202 of MCPAN 200. As a result, the effect of each input X1-Xn on the decision-making process in neuron core 202 depends on the weight assigned to that input in the respective weighting block 204. Another difference between SAN 100 and MCPAN 200 is that the latter uses a different firing rule than that used in the former. More specifically, neuron core 202 is configured to fire only if the sum (σ) of weighted inputs reaches or exceeds a predetermined threshold value, as expressed by Eq. (1):
                    σ        ≡                              ∑                          i              =              1                        n                    ⁢                                    W              i                        ⁢                          x              i                                      ≥                  T          0                                    (        1        )            where Wi is the weight applied in weighting block 204-i, xi is the value received at input Xi, and T0 is the threshold value. Note that, because neuron core 202 operates on real values, signal values received at inputs X1-Xn of MCPAN 200 are no longer limited to binary values. As such, the inputs of MCPAN 200 can be coupled to the outputs of different artificial neurons generating different (e.g., analog as opposed to digital binary) signal amplitudes at their respective outputs when they fire.
The use of input weights in MCPAN 200 gives that artificial neuron the ability to adapt to a particular situation, e.g., by changing the weights and/or threshold. This adaptation can be carried out during a training session, which can employ, e.g., one or more known-in-the-art neuron-adaptation algorithms. For example, the two most widely used algorithms are the Delta rule and the back-error propagation method. The former is often used in feed-forward ANNs, while the latter is preferred in feedback ANNs. After the adaptation is completed, the weights are normally fixed for further use in the processing mode.
An ANN having SANs 100 or MCPANs 200 is typically realized using a central processing unit (CPU) of a conventional computer. More specifically, the CPU emulates the ANN, i.e., uses a mathematical model of the ANN, as opposed to having a corresponding (massively parallel) physical structure. As a result, instead of the parallel signal processing inherent to neural networks, the computer emulating the ANN serially computes the response of one artificial neuron at a time. Disadvantageously, this serial processing causes the emulated ANN to have a relatively slow processing speed.