1. Field of Invention
The invention relates in general to Neural Networks and more specifically to a mechanism which allows Neural Networks to utilize discrete weights.
2. Background Art
As used herein, a Neural Network is a system that produces an output vector that is a function of an input vector. The mapping function between the input and output vectors is learned. Ideally, an output vector should match a desired output vector, i.e., a target vector, after the training process; the difference between the output and target vectors can be used in adjustment mechanisms.
Original theoretical approaches towards neural networks are based upon the idea that when two neurons in the brain are active there is a correlation between them. One early rule developed by D. O. Hebb is described in his book "The Organization of Behaviour", Wiley, 1949. The Hebbian rule states that when two neurons are firing simultaneously an association link between them is strengthened. Accordingly, the next time either of the two neurons fires, the other one is more likely to fire also. However, the Hebbian rule is not a sufficient model to explain the learning process. Under the Hebbian rule, the connection strengths between neurons grow without bound. If maximums are placed on the connection strengths, these maximums are always reached.
Subsequently, the Perceptron Model was developed by Frank Rosenblatt, and is discussed in his book "Principles of Neurodynamics", Spartan, 1962. The Perceptron Model was originally believed powerful enough to enable a machine to learn in a human-like manner.
The Perceptron Model includes input, hidden and output layers; each comprised of one or more processing elements. In response to input stimuli, the input layer provides information to the hidden layer. Similarly, the hidden layer provides information to the output layer. Connections between the input and hidden processing elements are fixed; connections between the hidden and output processing elements are adjustable.
In the Perceptron Model, if the inputs are boolean (i.e. either zero or one), then the intended purpose of the hidden layer is to extract some kind of features from the input data. However, if the inputs to the Model are continuous numbers (i.e. having more than two distinct values, rather than just two boolean values), then the hidden layer is not used. Instead, the outputs of the input layer are connected directly to the inputs of the output layer.
In the Perceptron Model, all learning takes place in the output layer. Under the Perceptron Model many problems have been experimentally and mathematically shown to be representable by connection strengths between layers. Rosenblatt's Perceptron Learning Algorithm enables a neural network to find a solution if there exists a representation for that problem by some set of connection strengths. Rosenblatt's Perceptron Convergence Proof is a well known mathematical proof that a Perceptron System will find a solution if it exists.
In operation, the Perceptron Model modifies the strengths of the weighted connections between the processing elements, to learn an appropriate output response corresponding to a particular input stimulus vector. The modification of the connection weights occurs when an incorrect output response is given. This modification of the weights changes the transfer of information from the input to the output processing elements so that eventually the appropriate output response will be provided. However, through experimentation, it was discovered that the Perceptron Model was unable to learn all possible functions. It was hoped that these unlearnable functions were only pathological cases, analogous to certain problems that humans cannot solve. This is not the case. Perceptron Systems cannot represent and learn some very simple problems that humans are able to learn and represent.
An example of a problem that the Perceptron Model is unable to represent (without 2.sup.N hidden processing elements, where N is the number of input nodes), and therefore cannot learn, is the parity or "exclusive-or" boolean function. To perform such a problem (with fewer than 2.sup.N hidden processing elements) a system would require two layers of modifiable weights. The Perceptron System cannot properly adjust more than one layer of modifiable weights. It was speculated that no learning mechanism for a system with multiple layers of modifiable weights would ever be discovered because none existed (Minsky & Papert, 1969, in "Perceptrons").
(The problem with using 2.sup.N hidden units is three-fold. First, since the hidden units, in the Perceptron Model, do not adapt, all the units must be present, regardless of the function which needs to be learned, so that all functions can be learned. Second, the number of units required grows phenomenally. For example, 2.sup.34 is approximately 17 billion, more neurons than in a human brain. This means that the largest parity problem the human brain could solve, if wired in this manner, would have at most 32 inputs. Third, the system would not generalize. Given two input/output vector pairs near one another, one trained and the other not, the system should be able to interpolate the answer from the first. With a large number of hidden units, it has been experimentally shown that this is not the case.)
Almost all adaptive neural systems share several features in common. Typically the processing elements of all systems have an output which is a function of the sum of the weighted inputs of the processing element. Almost all systems have a single layer of modifiable weights that affect the data transferred from the input to the output of the system.
The evolution of adaptive neural systems took a dramatic step forward with the development of an algorithm called "Back Propagation". This algorithm is fully described in the reference text "Parallel Distributed Processing, the Microstructure of Cognition", Rumelhart, Hinton, & Williams, MIT Press, 1986.
A back propagation system typically consists of three or more layers, each layer consisting of one or more processing elements. In one basic example, the system is comprised of an input layer, at least one hidden layer and an output layer. Each layer contains arbitrary, directed connections from the processing elements in the input layer to the hidden layer, and from the hidden layer to the output layer. There are no connections from processing elements to processing elements in the same layer nor connections from the output to the hidden layer nor from the hidden to the input layer; i.e. there are no cycles (loops) in the connection graph. (There are hypothesized mechanisms for networks with cycles in them, but they are not being scrutinized herein.)
In the Perceptron Model the idea of error was introduced. In a back propagation system, at each output processing element of the network, the error is quite easily realized. The error is typically the difference between an expected value and the output value. This error is used to modify the strength of the connection between a processing element and the output processing element. Ideally, this reduces the error between the expected output and the value output by the processing element in response to the input. The Perceptron Model lacks the ability to allocate an error value to the hidden processing elements and therefore cannot adjust the weights of any connections not coupled to an output processing element. In a system utilizing the Back Propagation algorithm, an error is assigned to the processing elements in hidden layers and the weights of the connections coupled to these hidden processing elements can be adjusted.
Several attempts have been and are being made to reduce Neural Networks to silicon. They are in general having difficulties. One of the major bottlenecks is how to store the interconnection values. If the values are pure analog, there are difficulties storing the values. If the values are stored digitally, there are problems with precision because of the need for very small adaptation steps.
Most Neural Networks algorithms, such as Back Propagation, Hopfield Nets, Perceptrons, and so on, utilize matrices of numbers to perform their work. The values of these numbers are usually not constrained significantly. Representing and storing these numbers in silicon has proven to be one of the more difficult problems when attempting to implement neural network algorithms. While the fixed values utilized in Hopfield Nets can be directly etched into silicon, there does not exist a solution when a weight must be modifiable and retain that modified value for an extended period of time (i.e. longer than minutes, ranging to years).
The primary difficulty existing today is that the value must be stored either as a value on a floating gate (as is used in ROM technologies), or on a simple capacitor. Both these mechanisms have detractions.
Floating gate technologies that store arbitrary analog voltages are experimental at this time. It is probable that within six months to four years these technologies will become feasible, primarily because of the need for them within neural networks.
The problem with capacitors is that they leak. The voltage values stored on a capacitor dissipate relatively quickly with time. This is generally unacceptable for long term storage. The only method around this is for training to proceed indefinitely, thereby making the dissipation less noticeable. It constrains the maximal size of the training set. The system should't forget what it learned at the beginning of the training set before it reaches the end! It also means that the entire training set and training support system must be shipped into the field, which is not always practical.
The invention described herein describes a method whereby these problems are eliminated.