1. Field of the Invention
This invention relates to content addressable memory systems useful in a variety of data and signal processing, image recognition and other computational tasks. In particular, the present invention relates to content-addressable memory architecture in combination with learning and retrieval methodologies for a neural network error-correcting content-addressable memory system.
2. Discussion of the Related Art
Content-addressable memory systems are generally distinguishable from random access memories on the basis of the manner in which stored data is retrieved. In random access memory systems, data is retrieved or located based upon a specified numerical, typically binary encoded address. In content addressable memory, or CAM, systems, data is recalled in response to the presentation of a noisy or partial version of one of a set of defined bit strings or codewords stored in memory.
In the case of "noisy" input problems, the task is to retrieve a noise-free version of the input bit pattern. This type of content-addressable memory is referred to as error-correcting content-addressable memory. The methodology and architecture of the present invention is particularly well-suited for such uses.
Error-correcting CAM optimally returns the pattern stored in memory which most closely resembles the input noisy bit string. The number of bits differing between the input string and the stored bit string is referred to as the Hamming distance. Thus, the task of error-correcting CAM is to return the stored bit string which is closest in Hamming distance to the input bit string.
In the case of partial-contents input tasks, the CAM function is less demanding, and essentially involves a comparison or table look-up function to locate and retrieve the complete bit string based upon the assumption that each of the bits in the partial input are correct. As will be appreciated by those of ordinary skill in the art, the architecture and methodology of the present invention are suitable for use in any error-correction or best match computational task.
The error-correcting CAM computational task can be reviewed as the decoding of codewords received from a noisy communication channel. In this application, a group of defined bit strings stored in memory constitute a set of valid data words, bit strings or codewords. The input "noisy" or error containing word is compared to the defined codewords, and the closest defined codeword is provided as an error-corrected output. In conventional error-correcting methodologies, the computational time and complexity increase in proportion to the number and bit length of defined codewords.
Neural networks are characterized by a high degree of parallelism between numerous interconnected simple processors. The neural network nomenclature is derived from the similarity of such networks to biological neural networks. See for example "Computing with Neural Circuits: A Model", Hopfield and Tank, Science, Vol. 233, pp. 622-633 (1986).
In general, neural networks are formed or modeled using a number of simple processors or neurons arranged in a highly interconnected pattern wherein each of the neurons performs the simple task of periodically updating its output state based upon the value of signals presented to it as inputs. The design of a neural network involves determining the number, arrangement and weight of the interconnections between neurons. The weight of a connection corresponds to the biological synaptic strength and determines the degree in which the output signal on one neuron will effect another neuron to which it is connected. Thus, each neuron or processor receives input signals which are derived from the output or activation states of other connected neurons. These activation states or output signals are linearly operated on via the connective weights and then summed. The summation of inputs can be accomplished in an analog circuit using a capacitator shunting the input, for example, so that the net input voltage is the integrated sum of all input voltages. This input signal or state is then operated on by a non-linear processor function to produce an updated output activation state. Alternatively, the input signals can be summed by operation of the resistive or conductive network in parallel to the input of an operational amplifier having a shunting capacitor/resistor output stage. To provide an analog to inhibitory synaptive effects, a pair of operational amplifier circuits can be used to model the neuron or processor with one receiving input signals on the normal input and the other receiving input signals on the inverted input.
The connections between processors or neurons can be uni-directional (feed forward) or bi-directional. The processor function can be a step function which produces a binary output based upon whether or not the input signal achieves a defined threshhold value, or it may be a sigmoid function such as the hyperbolic tangent which produces an analog activation state output. The pattern of connectivity among processors is referred to as the network architecture.
In a neural network, each processor or unit need only have sufficient memory to represent its current activation state or value. The unit's activation state is placed on its outgoing connections and is readable by those other units to which it is connected. Input signals are provided to the neural network by constraining or fixing the activation states of selected units, designated as input units. The remainder of the units then repeatedly update their respective activation states until the network settles into a steady state. The steady state value of the activation states of designated output units constitutes the network output signal.
In error-correcting CAM networks, the activation state of a unit or single processor represents a single bit in a bit pattern. If the unit's processor function is a step function, each unit is either "on" or "off". If the processor function is sigmoid or analog, each unit's activation state is interpreted as being "on" or "off" based upon whether it is above or below the midpoint of its range of values. Thus, while in both cases the output is binarily coded, in analog networks the activation states of particular units contribute to or affect the input state of connected units whether or not the output is interpreted as "on" or "off". In binary step function networks, the output activation state of each unit is either "on" or "off" based upon an input threshhold.
Information or data is stored in a neural network by setting the connection weights to prescribed values. It is the correct pattern of connection weights between individual units that enables the output units to adopt the correct stable pattern of activation in response to a given input activation state placed on the input units. In analog circuitry, the connection weights are typically represented by the linear resistance or conductance value of resistors between outputs and inputs of various interconnected processor units. The processor function in analog networks can be modeled by a pair of operational amplifiers having the appropriate sigmoid transfer function.
Given a specific task, i.e., error-correcting CAM, cost minimization, image recognition, etc., a learning algorithm is developed to determine the correct values for the weights, thereby effectively programming and storing data in the network. In error-correcting CAM, the first step is to store a set of bit patterns in the network which correspond to legal or correct bit strings or code-words. Thereafter, if a noisy or error-containing bit pattern is presented as an input to the network, the activation states of the output stabilize in the pattern of activation that, among patterns stored in the network, differs in the least number of bits or Hamming distance, from the input bit pattern.
The primary advantage of neural network CAM is that contrary to conventional error-correcting methodology implemented with programmed computers where the time required to execute increases substantially with the size and number of stored patterns, retrieval in a neural network implemented in hardware is a fully parallel process and takes only the time necessary for the circuit to settle in the steady state.
A neural network content-addressable memory model suitable for error-correction was disclosed in Hopfield, "Neural Networks and Physical Systems with Emergent Collective Computational Abilities", Proceedings of the National Academy of Science, U.S.A., Vol. 79, pp. 2554-2558 (1982). This method of modeling neural networks utilized a step-function processor function and bi-directional connections. Significantly, the Hopfield model disclosed the importance of symmetric weighting of the connection matrix, i.e., where W.sub.ij =W.sub.ji for all W (conductance) values of the interconnecting branches between unit i and unit j. Hopfield disclosed that a network having an interconnectional weight matrix wherein all W.sub.ij =W.sub.ji, and all diagonal elements W.sub.ii =0 will always converge to a given stable state given any chosen initial input state. Hopfield disclosed that such networks were useful as content-addressable memories wherein each keyword was associated with a particular stable state, so that when an input bit string was presented as an initial activation value (representing an error-containing bit string) to the network, the network would evolve over time to a stable activation state which would represent an error-corrected version of the input bit string.
A subsequent Hopfield model also demonstrated that an analog circuit comprised of operational amplifiers having an inverted and a normal output, and having resistor-capacitator input summing elements and a resistive connection matrix was able to operate as a CAM. Hopfield further demonstrated that while mathematic models required symmetry, real world circuits functioned adequately provided the interconnection impedance was approximately symmetrical Hopfield further disclosed that neither the gain or transfer function of the operational amplifiers nor the input capacitance need be strictly equivalent to their symmetric counterparts to produce satisfactory results in the analog circuit model, provided the gain curve was steep, i.e., in high gain systems.
FIG. 1 is a schematic illustration of a Hopfield model network. In the Hopfield model, the neurons or units were not segregated into distinct sets or layers of input and output units. Rather, the input and output units were the same, and the input activation states were simply corrected in place. In other words, an initial input state was placed on the units and then allowed to evolve to a stable state representing the corrected bit string.
In the Hopfield model, all units were interconnected and the interconnectivity weights were determined according to the Hebbian rule, a single-pass, non-iterative calculation. Compared to iterative algorithms, which make repetitive modifications of the weights until a prescribed criterion is satisfied, the Hebbian method is primitive, and can store no more than 0.14N uncorrelated or random patterns, where N is the number of units or processors in the network.
A variety of algorithms to improve the storage capacity of the Hopfield model have been disclosed. These are the unlearning method of Crick and Mitchison, "The Function of Dream Sleep", Nature, Vol. 34, pp. 111-114 (1983); Hopfield, Feinstein and Palmer "Unlearning Has Stabilizing Effects in Collective Memories", Nature, Vol. 304, pp. 158, 159 (1983); the bi-directional perceptron algorithm of Diedrich and Opper, "Learning of Correlated Patterns in Spin-Glass Networks by Local Learning Rules", Physical Review Letters, Vol. 58, pp. 949-952 (1987); E. Gardner, "Maximum Storage Capacity in Neural Networks", Europhysics Letters, Vol. 4, 4, pp. 481-85 (1987); Krauth and Mezard, "Learning Algorithms with Optimal Stability in Neural Networks", Journal of Physics A; Math. Gen., Vol. 20, pp. 1745-1752 (1987); and Wallace, "Memory and Learning in a Class of Neural Network Models", Lattice Gauge Theory--A Challenge to Large Scale Computing, Bunk and Mutter, Editors, New York: Plenum (1986). These methods and others have increased storage capacities in Hopfield models to a larger fraction of N. For Hopfield models, a theoretical capacity limit of 2N random patterns has been derived. Venkatesh, "Epsilon Capacity of Neural Networks", in J. S. Denker (Ed.), Neural Networks for Computing, Snowbird, U.T. 1986, American Institute of Physics Conference Proceedings, p. 151 (1986).
For practical utility in noise correction applications, this storage capacity is too low. If transmitted bit patterns are all of length N, it is typically required that there be many more than 2N legal or codeword patterns in memory. Often, the number of patterns or codewords required is an exponential function of the length of the bit pattern.
One significant limitation of the Hopfield model networks is the inability to utilize hidden units or processors. A hidden unit is one which is not designated for input/output functions (termed visible units) but which may be utilized to learn alternate encodings of bit patterns being stored, and can thus facilitate storage and retrieval across the output units. Algorithms for bi-directional, feedforward and other types of networks utilizing hidden units have been developed by others. See for example, Peterson and Hartman, "Explorations of the Mean Field Theory Learning Algorithm", MCC ACA-ST/HI-065-88, Microelectronics and Computer Technology Corporation (1988), the contents of which are incorporated by reference herein to illustrate the background of the invention and the state of the art. None of the prior developed neural network models and learning algorithms have been demonstrated to yield storage capacities greater than 4N.