1. Field
This application relates to neural networks, and in particular, to hardware implemented analog-digital neural networks implemented in both two and three dimensions.
2. Background
Neural networks (NNs) are widely used in pattern recognition and classification, with many potential applications to fingerprint, iris, and face recognition, target acquisition, etc. The parameters (e.g., ‘synaptic weights’) of the NN are adaptively trained on a set of patterns during a learning process, following which the NN is able to recognize or classify patterns of the same kind.
A key component of a NN is the ‘synapse,’ at which weight information is stored, typically as a continuous-valued variable. For applications that would benefit from compact, high-performance, low-power, portable NN computation, it is desirable to be able to construct high-density hardware NNs having a large number (109-1010 or more) of synapses. Currently a NN is typically realized as a software algorithm implemented on a general-purpose computer, which is bulkier and operates at higher power than the hardware NN disclosed herein.
Neural networks may be used for three broad types of learning. In “supervised learning” a set of (input, desired output) pairs is provided to the network, one at a time, and the learning algorithm finds values of the “weights” (the adjustable parameters of the network) that minimize a measure of the difference between the actual and the desired outputs over the training set. If the network has been well trained, it will then process a novel (previously unseen) input to yield an output that is similar to the desired output for that novel input. That is, the network will have learned certain patterns that relate input to desired output, and generalized this learning to novel inputs.
In “unsupervised learning,” a set of inputs (without “desired outputs”) is provided to the network, along with a criterion that the network is to optimize. An example of such a criterion is that the network be able to compress the input into a smaller amount of information (a “code”) in such a way that the code can be used to reconstruct the input with minimum average error. The resulting “auto-encoder” network consists of, in sequence, an input layer, one or more “hidden” layers, a “code” layer (having relatively few neurons), one or more hidden layers, and an output layer having the same number of neurons as the input layer. The entire network is trained as if this were a supervised-learning problem, where the “desired output” is defined to be identical to the input itself.
In a third type of learning, “reinforcement learning,” a “reward/penalty” value is provided (by an external “teacher”). The “reward/penalty” value depends upon the input and the network's output. This value is used to adjust the weights (and therefore the network's outputs) so as to increase the average “reward.”
NN applications may include pattern recognition, classification, and identification of fingerprints, faces, voiceprints, similar portions of text, similar strings of genetic code, etc.; data compression; prediction of the behavior of a systems; feedback control; estimation of missing data; “cleaning” of noisy data; and function approximation or “curve fitting” in high-dimensional spaces.
In a classification or recognition problem, one wants to extract certain types of features that characterize the input (the input can be visual, auditory, text-based, or of other type), and that are similar for inputs that should be classified in the same way (e.g., two different handwritten digit “2”s, or two images of the same person's face). A properly designed neural network can discover such features (either using supervised or unsupervised learning) even if the particular features of interest have not been specified by the user; the NN can represent those features by the network's weight values; and the NN can then use these features to compute an output classification or identification for a previously unseen input.
For example, consider a face recognition application. A neural network would be used to learn a relatively small set of characteristic features, and then to compute a “feature vector,” which is a set of numbers for each image. The learning method should have the property that the resulting feature vectors for two images that have the same classification (e.g., that correspond to the same person's face in different poses) are similar to each other. After training has been done, a novel image is processed by the network to yield its feature vector. This feature vector is compared with an already-stored list of feature vectors, and the stored vectors to which the novel vector is most similar yield a list of “most likely matches” to the novel image. The final comparison can be done using non-NN postprocessing. Alternatively, the NN can have an output layer (following the “feature” layer) comprising one “neuron” for each output class. The latter alternative would preferably be used when the number of classes is small (e.g., the ten digits in a handwritten digit recognition task).
Thus a NN can be used as part of a search process, especially one in which the set of characteristic features is not known in advance. For another example, there are methods for document search in which a document is preprocessed to extract the most distinctive words contained therein (e.g., those that are common in the document, but uncommon in the total corpus). Using a vector of values corresponding to this set of most-distinctive words as input to a NN, the NN can be trained to produce similar (or the same) classification outputs for documents whose inputs overlap significantly. The output may take the form of clusters of points, one for each document, where the documents in each cluster are about the same topic, and different clusters correspond to different topics. Thus a search that uses the NN's output can reveal other documents on the same topic in the corpus.
More generally, NNs can be used as embedded components of larger systems that include (non-NN) preprocessing and postprocessing steps.
Another NN protocol would be to deal with an incoming picture P at location Q requiring recognition amongst a large centrally-stored database of M similar-format pictures. The picture is sent from Q to the database. The database is linked to a large number N of the analog-digital feedforward neural network (ADFFNN) chips disclosed herein. All of these chips are trained simultaneously on P, as described herein, so that they recognize P. Then the whole database content is run through the ADFFNN chips in parallel read mode, each chip accepting M/N pictures to read. Any output from recognition events by the chips is returned to Q. If the number of chips, N, is large enough, then the process can be done in an acceptable time. The chips are kept busy by a time-sequence of inputs from various locations Qn.
Disclosed herein are designs for NNs on a chip or integrated device that contain analog networks combined with digital communication, processing and storage functions which may overcome the inefficiencies of conventional neural networks implemented in software-based systems.