This application is related to co-pending commonly assigned application Ser. No. 07/859,698, filed Jun. 11, 1992 naming Rejman-Greene et al as inventors.
I. Field of the Invention
This invention relates to a method of training a neural network having an input for inputting input vectors, an output for outputting output vectors and an adjustable response determining means for determining which output vector is output by the network in response to the inputting of a given input vector.
II. Related Art and Other Considerations
A neural network can, in general terms, be regarded as a series of nodes, each node providing an output which is some function of the outputs of other nodes to which the node is coupled. For example, a particular node might output a signal at a first level if the weighted sum of the outputs of those other nodes exceeds some set threshold and a signal at a second level if it doesn't. Different nodes may receive the outputs from different sets of other nodes, with weightings and threshold values particular to that node. An input vector is coupled to a set of the nodes via the input of the network and an output vector is generated at the output of the network. The response of the neural network to the input vector is determined in this case by the values of the weightings and thresholds which collectively form the response determining means for this type of network.
Other implementations of neural networks may employ techniques different to the weighting and threshold technique described above but will nevertheless have some response determining means which determines the output vectors provided by the particular network in response to input vectors.
An example of an optical implementation of a neural network is described in an article by N. M. Barnes, P. Healey, P. McKee, A. W. O'Neill, M. A. Z. Rejman-Greene, E. G. Scott, R. P. Webb and D. Wood entitled "High Speed Opto-Electronic Neural Network", Electronics Letters 19th July 1990, Vol 26, No. 15, pp 1110-1112. This exploits the possibilities of parallel processing and optical interconnections to provide a high throughput. That is, the neural network has a rapid response to an applied input vector, which in the reported configuration can be clocked at rates in excess of 10 Mbits/s. The network is shown at FIG. 1 of this application. It is a two-layer perception network which can recognise exclusive-or (EXOR) combinations in a pair of input stream A and B.
Its basic operation is to perform a matrix-vector multiplication in which each row of a 4.times.4 optical detector array 80 provides an electrical signal corresponding to the sum of the light intensities impinging on the row.
A computer generated hologram 100 and lenses 102 and 104 generate intensity coded beams (not shown) directly from a single laser source 106 which impinge on the individual modulators of a modulator array 82. The relative intensities of the beams are shown by the numerals on the modulators. These different intensity beams provide the different weightings to be applied to the outputs of the modulators in their connection to the nodes formed by the rows of detectors.
A signal is passed to the next level via pre-amplifiers 84 or not, depending on whether this weighted sum is above or below some set threshold either a fixed bias from a bias supply 86 or relative to one of the other weighted sums.
There are several ways in which the number of intensity coded beams could be generated including, for example, using a fixed mask to filter the intensity of individual beams from an array of sources.
A novel method of determining the weights devised by the applicant (not published at the time of filing this application) for an optical neural computer is based on the fact that the modulation depth that can be achieved at a given wavelength for a multiple quantum well (MQW) modulator is determined not only by the voltage swing of the digital drive but also by the applied bias across the modulators. See FIG. 3 of N. M. Barnes, P. Healey, M. A. Z. Rejman-Greene, E. G. Scott & R. P. Webb, "16 Channel Parallel Optical Interconnect Demonstration with an InGaAs/InP MQW Modulator Array," Elect Letter, 26, 1126-1127 (1990). This behaviour means the modulation depth can be adjusted by changing the bias voltage, even if the voltage swing of the digital drive which determines whether the modulator is in the "on" or "off" state is unaltered. By the addition of a low pass filter to the circuit of FIG. 1, a slowly varying (10-100 KHz) analogue bias voltage can be superimposed onto the fast (about 50 Mhz) digital data. Hence the modulation depth, which corresponds to the signal weight, can be changed at a rate of 10-100 Khz. This slowly varying voltage level can be set by standard low-bandwidth analogue drives under computer control. The hologram that produces the multiple beams no longer has to produce beams with a pre-programmed weight matrix. A uniform array of beams can now be used instead, the weights being determined by the independently adjustable bias voltages on the modulators. Further, the weights can now be adjusted during a training phase for the neural network.
This arrangement of optical neural network provides a means of sending high speed digital data through a neural network while still being able to change the weights at more modest speeds under computer control.
There are other ways in which this high data throughput could be achieved--in particular the data signals could be produced externally and illuminate a sandwich of modulator/detector arrays. The modulators could then be addressed independently to adjust the weights.
The bias voltages which determine the threshold levels can also be adjusted at the same slower timescale in these implementations.
In those cases for which the set of desired output vectors ("target set") of the network corresponding to a set of known input vectors ("training set") is known, as appropriate configuration of the response determining means can be achieved by supervised training of the network. That is, the network is presented with the training set of input vectors and the response determining means adjusted according to some algorithm to reduce the error (i.e. increase the similarity) between the actual output vectors from the network and the target set of output vectors.
A known method of training a neural network comprises applying each input vector from the training set to the network in turn. A measure of the similarity of the output vector from the network and the target output vector for that input vector is obtained and the response determining means adjusted in a manner dependent to at least some extent on the measure of similarity. This is repeated for each input vector of the training set in turn.
An example of such a training scheme is described in an article by P. D. Wasserman and T. Schwartz titled "Neural Networks, Part 2" 8294 IEEE Expert Vol 3 (1988) Spring, No. 1 pp 10 to 15.
This method of training neural networks is applicable to those networks which employ such variably bistable modulators, and, indeed, to other networks in which the optical intensity can be varied to determine the weighting. However, a characteristic of such training methods as applied to such an optical neural network is that the computation of the weight and threshold changes that have to be made for a given input vector must be completed before the next input vector is applied to the network. That is, the high data throughput of the network is of no benefit during the training phase because the rate at which the input vectors can be applied to the network is restricted to the relatively slow rate at which the changes can be calculated.
D. P. Casasent and E. Barnard in an article titled "Adaptive Clustering Optical Neural Net" describe pattern recognition techniques for clustering and linear discriminant function selection combined with neural net methods in which a net is trained on an input set of training vectors to classify the inputs into one of c classes. The training algorithm involves inputting each of the input training vectors in turn to the network and for each determining the most active hidden neuron in the proper class and the most active hidden neuron in any other class. Once this has been done, the vector inner product of the weight vector and input vector for each of these two hidden neurons is calculated, a perceptron error function calculated, and an error penalty calculated and added to the error function. The next training vector of the set is then input and the calculations repeated for this next input vector.
After all the training vectors have been input, all the errors and error gradients are accumulated and the weight of the net adapted using a conjugate-gradient algorithm. The training vector set is input repeatedly until satisfactory performance on the test set of input vectors is obtained.
As discussed in this article the inner vector products of the weights of a hidden node and the input vector can be implemented optically.
Because the training method of Casasent et al requires the two class maxima to be determined and the error function and error penalty to be calculated for one input vector before the next can be input to the net, the training scheme does not take advantage of the high data throughput of the neural network although the vector inner product is calculated rapidly in the optical rather than the electronic domain.