This invention relates to both associative and content-addressable memories (ACAMs) which utilize the "neural network" concepts of threshold logic, current summations, and resistive connections for decisional operation.
A memory is content addressable if it allows recall of a stored word without reference to the physical location where the word is stored. In other words, addressing is based on semantic content, not the hardware of storage. An associative memory allows recall based on only partial, or partially erroneous information. In contrast, standard computer memories require complete knowledge of a specific address for retrieval.
ACAMs have numerous potential practical applications, e.g., for pattern recognition, vector quantization, and associative search. As memories of the human brain are associative and content-addressable, another use for ACAMs is to provide models for psychology or neurophysiology.
Over the past forty years, a variety of "connectionist" or "neural network" models for associative memories have been studied. See Denker, J. (ed) (1986) Neural Networks for Computing, AIP Conference proceedings 151 (Snowbird, Utah) New York; Hinton, G. E., Anderson, J. E. (eds) (1981) Parallel Models of Associative Memory, Lawrence Erlbaum Associates, Inc., Hillsdale, N. J.; Kohonen, T. (1984) Self-Organization and Associative Memory, Springer-Verlag, Berlin; McLelland, H. L., Rumelhart, D. E. (1986) the PDP Research Group, (eds) (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press/Bradford Books, Cambridge, MA; Hopfield, John, U.S. Pat. No. 4,660,166, "Electronic Networks for Collective Decision Based on Large Numbers of Connections Between Signals." In these models, the basic unit is a "neuron" which performs a weighted linear summation of its inputs, and fires if a threshold is exceeded. These models are, of course, inspired by biology--a hardware "neuron" is supposed to embody a simplified human neuron.
A recent surge of interest in connectionist models in part reflects the general exploration of methods for achieving massive parallelism, but more specifically stems from new ideas for their hardware implementation. Although the present invention utilizes some basic principles relevant to connectionist ACAMs, a substantial improvement in the design of such memories is attained.
It should be noted that hardware implementation of connectionist models require high connectivity and, frequently, resistive connections. These features are not available in traditional VLSI circuits, but a number of groups are known to be working toward large-scale implementations of connectionist models with resistive connections. Analog chips with of order 10.sup.3 fully interconnected neurons (10.sup.6 connections) appear to be feasible, these will be comparable in hardware complexity to existing 1 megabit digital RAM chips. Since large digital memory circuits with 16 megabytes of storage capacity are becoming common even for desktop computers, it is possible that analog associative networks with of order 10.sup.8 connections may be implemented. Such memory capacity is still dwarfed by the human brain, which has of order 10.sup.14 connections.
It is proposed that the simplest ACAM is based on the idea of "one memory, one neuron," so that for each stored word, there is a single neuron whose job it is to recognize that word, and fire when it is presented. Processor neurons of this type are known as grandmother cells, in reference to a hypothetical cell which fires only when a subject sees a specific object, e.g., his grandmother. Sherrington, C. S., Man on His Nature, Cambridge University Press (1941); Barlow, H. B., Perception 1:371 (1972).
For reasons that will become apparent, this sort of ACAM will be refered to hereafter as a unary ACAM. The network consists of three layers: an input layer, an intermediate or processor layer, and an output layer. The input and output layers each have N neurons representing N-bit vectors or words, while the intermediate level has G neurons. There is no feedback to the input as in the Hopfield model. (See U.S. Pat. No. 4,660,166 cited above.)
Considering quantitatively the simplest version, based on 2-state threshold neurons, let I.sub.i, R.sub.j, O.sub.k be the output value of the ith input, jth intermediate, and kth output neuron respectively; and let A.sub.j.sup.i, B.sub.k.sup.j be matrices representing the strength of connections between input and intermediate, and intermediate and output layers, respectively, in an evident notation. Thus ##EQU1## where H is the Heaviside step-function, equal to 1 or 0 according to whether its argument is positive or negative. In general, the neural response function may be any suitable thresholding function, whether smooth such as a sigmoid or discontinuous like the Heaviside step function. One popular sigmoid is the logistic function. The .theta..sub.j and .theta..sub.k are thresholds associated with the j.sup.th neuron and k.sup.th neuron respectively. These thresholds may be fixed, variable, or dynamically determined.
Now to store M=G N-bit binary words .epsilon..sub.i.sup..mu. valued .+-.1 (.mu.=1, . . . , M, l=1, . . . N) simply set EQU A.sub.j.sup.i =.epsilon..sub.i.sup.j, B.sub.k.sup.j =.epsilon..sub.k.sup.j,(2)
or in matrix notation EQU A=.epsilon..sup.T, B=.epsilon.. (3)
Note that the present discussion will use the {+1, -1, 0} data representation. The 0 value corresponds to logical don't know or don't care. Analogous calculations and circuits apply if a {1, 0} data representation or a {+1, -1} data representation is used instead.
In principle, a stored word may be recovered by presenting b bits with values .+-.1, and setting the other input bits to 0, so long as these b bits are enough to uniquely specify the stored word. In this case, the processor neuron representing this word will have input b, and all other processor neurons will have input no greater than b-2 so that retrieval may be made by thresholding at b-1.
By presenting b known bits corresponding to a particular stored word, and asking that with probability 1-.epsilon. no other among M random words share these bits, the following relationship is established: EQU b.gtoreq.log.sub.2 M+log.sub.2 (1/.epsilon.). (5)
To insure that no confusion arises for any word (with the same confidence), the relationship requires EQU b.gtoreq.2log.sub.2 M-1+log.sub.2 (1/.epsilon.). (6)
For a given set of b bits, even b&lt;log.sub.2 M, it may happen that no confusion can arise. This possibility can be exploited by wiring the processor neurons into a winner-take-all circuit which can be implemented, for example, by using lateral inhibition of all neurons having an input less than the neuron having the highest input. Then only the processor neuron with the highest input will turn on. Assuming accurate values for the synapses, this scheme achieves theoretically optimal retrieval in that a stored word is retrieved as soon as sufficient bits are supplied to uniquely specify it. An alternative approach to the winner-take-all implementation is to supply an external threshold voltage to each processor neuron in parallel, and increase or "ramp" the threshold until only one processor neuron is on. This is a preferred approach for G&gt;100. Alternatively, if ambiguous information is presented, it may be valuable to learn that the input bits supplied are shared by two or more standard words. Such information may be obtained either by thresholding or by ramping.
In practice synapses will have finite accuracy. A standard deviation of .sigma. in synaptic values, for example, will lead to a signal of order b.+-..sigma..sqroot.b. This is not catastrophic (for small .sigma.), but does lead to a rise in the number of bits required for retrieval. Similarly, one can retrieve a complete stored word from noisy partial information.
To cope with imprecise connections, the present invention modifies the strategy by looking for mismatches instead of matches. This requires a change in the circuits. In the standard design using input "neurons" as switches, each input cue is connected by an inverting amplifier to one conductor and by a noninverting amplifier to another, similar to the input arrangement of the Hopfield U.S. Pat. No. 4,660,166 cited above, but in a feed forward system instead. Depending upon whether the inverting conductor or the noninverting conductor is connected to the input of a processor neuron, a negative or a positive synapse is normally formed. The present invention proposes to alter this and connect each input to two threshold neurons, one corresponding to the "inverted" conductor having output H(-I-1/2), the other corresponding to the "noninverted" conductor having output H(I-1/2) where I is the signal value of the input. Thus if I=+1, the noninverted neuron has value 1 and the inverted neuron value 0, while if I=-1, their values are exchanged, and if I=0, both inverted and noninverted neurons have value 0. Then the synaptic values are written as if the negative of the word were to be stored, and have normal processor neurons which turn on if their input is below a threshold.
The great practical advantage of the inverted logic of the present invention is that through inversion only mismatches between the input bits and the stored bits draw current, so that inaccuracies in the resistors cannot easily lead to an erroneous result. For instance, a triple mismatch will not be mistaken for a double mismatch, so long as the resistor variation is .ltoreq.33%, essentially independent of the number of matching bits.
The unary ACAM makes efficient use of hardware. For hetero-associative memory, the efficiency of the unary network, in terms of the information it can store relative to the hardware requirements, is optimal. The network stores M.ltoreq.G words, each containing N bits, at both the input and output stages using only GN binary synapses per stage, where G is the number of neurons (grandmother cells) in the intermediate level, as noted hereinbefore. At full capacity, the storage efficiency defined as information bits/hardware bits becomes 1, the information theoretic bound.