Neural network systems promise a totally new approach to information processing that is inherently fault-tolerant, uniquely suited to fuzzy and ill-posed problems (e.g. image or speech recognition), and extremely fast. To realize its full potential for high-speed processing, however, a neural network must be implemented in fully parallel hardware. Furthermore, the majority of applications require that a neural network be capable of learning from example and generalizing the learned examples to new input values. This is necessary in situations where the relationship between input and output is difficult to ascertain explicitly or changes with time. This, of course, is a major point of difference between the implementation of problems on digital computers and their implementation in a neural network. The digital computer is merely an automaton which can perform a sequence of precisely defined steps (i.e. a computer program) on data. Any data which does not conform to the definition of the problem as contained in the programming produces an error state. A neural network, on the other hand, has the problem loosely defined by the logic of the synapses. The weighting factors associated with the various synapses are adjusted as part of a repetitive process wherein sets of input values are input to the network, the outputs from the network are compared to known results for the inputs, and the weighting factors adjusted until the correct answers are produced for all the inputs. In this way, the network "learns" how to solve the problem under any input conditions. Interesting descriptions of prior art techniques and apparatus of the neural network variety can be found in the very recently issued patents of Faggin et al. (U.S. Pat. No. 4,773,024 (September 1988) and U.S. Pat. No. 4,802,103 (January 1989). The teachings of the now-expired patent of Yoshino (U.S. Pat. No. 3,601,811) are also relevant to the problem as applied to fore-runners of neural networks, i.e. discrete component analog computer "learning machines".
Standard learning algorithms used for "teaching" neural networks, such as backpropagation, can be demonstrated for simple cases in simulation with ease but are extremely difficult to implement in hardware, particularly for more than trivial problems. Analog systems exhibit non-ideal circuit behaviors that are not compensated by the learning algorithms; and, digital implementations require prohibitive amounts of silicon real estate. It should be noted that many applications do not require that the learning phase of the network implementation and operation be time efficient, only the retrieval phase. Military applications would be a good example of such conditions. The time necessary to teach a neural network decision-making computer as part of the manufacturing process is unimportant as compared to the real-time operation of the network under field conditions which will take place well in the future. Another good example is speech recognition wherein a machine employing the neural network is trained from taped recordings of one or more speakers. For such cases (which represent the vast majority), the problem then is how to implement a learning neural network system in fully parallel analog hardware; that is, how best to implement the learning process without regard to the time involved in order to implement the neural network itself as fully parallel analog hardware in order to minimize the run-time time of computation.
The standard method of training a learning network is to iteratively cycle through a training set. For each training set item, the prompt (i.e. the inputs) is applied to the network inputs, a data retrieval operation is performed, and the connection weights of the synapses are modified so that the network output approaches the target (desired) vector. Weights are only slightly adjusted for each training set item as, otherwise, the system tends to "forget" information previously stored. Consequently, the learning phase of the prior art algorithms tends to require much computation. In some cases, literally millions of training iterations may be required to reduce errors to acceptable levels.
In traditional learning algorithms, the difference between the output of the network and the target for each prompt (i.e. the error) is backpropagated through a complex distributed feedback system that serves to calculate the increment by which each weight in the system is modified. It is this complicated learning circuitry which adds to the complexity of the network as mentioned earlier. In this regard, it should be mentioned that while it may be possible to perform the learning on a computer and then download the weights into the neural network hardware (and thereby eliminate the learning portion of the hardware), such an approach can only work if the learning program accurately simulates the actual hardware behavior. Again, in all but trivial cases, such a goal is virtually impossible to realize. The feedback system is difficult to implement reliably in analog hardware for two reasons. First, backpropagation circuit components non-idealities are not intrinsically compensated for by the design of the algorithm; thus, error signals are generated by component error. Second, the weights (i.e. conductance) values are required in the backpropagation as well as in the feedforward portions of the algorithm. This latter factor is significant in some weighting circuit designs in which the weight circuit must be switched from one circuit configuration to another. In other words, in addition to the circuitry of the backpropagation algorithm which must be implemented as part of the neural network, there must be switching circuitry to switch commonly used components between the feedforward portions of the network and the backpropagation portions of the circuit. As can be appreciated, such a requirement simply adds to an already overly complex circuit design as well as increasing the probability of component and operation introduced errors.