Multi-layer neural networks may be used to classify patterns. These networks typically consist of layers of nonlinear processing elements (or "neurons") arranged in a highly interconnected hierarchy. Each neuron within the top layer of the network hierarchy accepts as input a weighted sum over all of the resolution elements of the pattern to be classified. Each of these sums is then nonlinearly processed by each top-layer neuron and outputted to the second layer of the network, in which each neuron accepts as input a weighted sum over all neural outputs of the first layer. This process continues until the output, or classification, layer of the network is reached. The outputs of this layer are then interpreted as the desired classification results. Typically, no more than two or three layers are required to achieve pattern classification and typically the number of neurons in each layer decreases as the classification layer is approached. The network is trained to classify patterns by pre-selecting the weights that interconnect the various layers. A good theoretical description of multi-layer neural networks may be found in Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations, by D. E. Rumelhart and J. L. McClelland (MIT Press, 1986).
Mathematically, the functioning of a single layer of a multi-layer neural network may be described as follows: EQU g[R.sup.(i).sigma.(i) ]=g[.function..sup.(i) ]=.sigma..sup.(i+1) ; i=1,2, . . . , N, (1)
where the pattern vector .sigma..sup.(i) is the input to layer "i"; the matrix R.sup.(i) represents the neuron input weights; N is the number of network layers; and g/ [.multidot.] is a nonlinear vector function which operates identically on each element of .function..sup.(i). Typically, g/ [.multidot.] operates on each element "k" of .function..sup.(i) as indicated in FIG. 1. The particular nonlinear transfer function illustrated in FIG. 1 has what is commonly referred to as a sigmoidal shape, with adjustable threshold ("a") and saturation ("b") points.
FIG. 2 shows an illustrative example of a three-layer neural network consisting of a three-resolution-element input pattern with two output classes. Each layer consists of a fully interconnected set of weights connecting the input to the summers. The output from the summers is fed through a nonlinearity, which completes the processing for that layer. The output from one layer serves as input to the next layer.
Pattern classification problems in which input patterns are two-dimensional images typically require two-layer neural networks which may contain as many as 10.sup.2 classification-layer neurons and 10.sup.3 input-layer neurons. For a 10.sup.4 -pixel image and fully interconnected layers, R.sup.(1) becomes a 10.sup.4 .times.10.sup.3 -element matrix and R.sup.(2) a 10.sup.3 .times.10.sup.2 -element matrix. Real-time (.about.10.sup.3 seconds) classification of unknown images therefore requires on the order of twenty billion operations per second [=2.times.(10.sup.7 +10.sup.5).times.10.sup.3 ]. Existing, all-digital electronic computers capable of such throughput occupy many cubic feet of volume and consume thousands of watts of power.
Optical devices in which the matrices R.sup.(i) may be stored in the form of two-dimensional Fourier-space holograms include those described by: D. Gabor in "Character Recognition by Holography" in Nature, 208, p.422 (1965); J. T. LaMacchia and D. L. White in "Coded Multiple Exposure Holograms", Applied Optics, 7, p.91 (1968); J. R. Leger and S. H. Lee in "Hybrid Optical Processor for Pattern Recognition and Classification Using a Generalized Set of Pattern Functions", Applied Optics, 21, p.274 (1982); and D. A. Gregory and H. K. Liu in "Large-Memory Real-Time Multi-channel Multiplexed Pattern Recognition", Applied Optics, 23, p.4560 (1984). Additionally, in a paper by T. Jannson, H. M. Stoll, and C. Karaguleff ("The interconnectability of neuro-optic processors", Proceedings of the International Society for Optical Engineering, Vol. 698, p. 157 (1986)), there is described, on page 162, an optical volume-holographic architecture for computing matrix-vector products. This disclosure is, however, in the context of providing interconnects for an all-optical, recurrent (feedback)-type neural network.
It is one object of this invention to provide a method and apparatus that employs a three-dimensional volume holographic medium in which multi-layer, opto-electronic neural network interconnects are stored and used to multiply pattern vectors.
It is another object of this invention to provide nonlinear processing means by which the intermediate and output pattern vectors computed within a multi-layer, opto-electronic neural network may be acted upon.
It is a further object of this invention to provide a compact (potentially less than 200 cubic inches), low-power (potentially less than 10 watts of prime electrical power), multi-layer, opto-electronic neural network capable of executing at least 2.times.10.sup.10 (twenty billion) arithmetic operations per second.