A conventional computer memory system can retrieve stored data only if the retrieval information, such as the read address, is known exactly. The reason for this is that in a conventional computer, a string of binary digits (a word) is stored at a specific location in the computer's memory, and nothing else is stored at that location at the same time.
P. Kanerva, in "Self-propagating Search: A Unified Theory of Memory", Report No. CSLI-84-7, published by Stanford University in 1984, hereinafter Kanerva (1984), has proposed a memory system called a Sparse Distributed Memory. The memory is "distributed" in the sense that each binary word is stored at many of the memory locations simultaneously and each memory location contains a linear combination of a subset of the stored data words rather than a single word. A brief description of his system follows.
Let the address space S be the set of all possible N-bit binary words. In other words, S is the set of all N-dimensional binary vectors, that is, vectors in which each component is either 0 or 1. In some applications members of this set will be considered both as addresses and as data words. If N is large, say between 100 and 10,000, then the number of possible addresses, 2.sup.N, is so large that a memory cannot be built with that many locations. Therefore, instead of implementing a memory location for each possible address, a large random sample of the addresses is chosen, say one million of them, and a memory location is implemented for each of these addresses. This is why the memory system is called "sparse".
One way to construct such a memory system is as follows. For each implemented memory location, which will be called a "hard memory location", there is an address decoder that determines whether or not to activate that location during a read or a write operation, and M counters which accumulate a linear combination of the M-dimensional binary data words stored at that location. There is one counter at each hard memory location for each of the M bit positions, or coordinates, of the data vectors to be stored in the memory. In some applications, the data words will be N-dimensional vectors like the address vectors, so that a data word may be thought of as an address, or a pointer to the memory.
For two binary vectors x=(x.sub.1, . . . , x.sub.n) and y=(y.sub.l, . . . , y.sub.n), let ##EQU1## This is known as the Hamming distance between x and y. It is the number of coordinates for which x.sub.i .noteq.y.sub.i.
When a read or a write operation is performed at an address x, all of the hard memory locations within a fixed Hamming distance r of x are activated. This region of activation, which is a subset of S, may be viewed geometrically as a sphere with center at x and radius r. For example, Kanerva (1984) showed that if N=1000 and r=451, then the sphere contains about 1/1000 of the points in S. Therefore, since the hard memory locations are randomly distributed throughout S, the number of hard memory locations in this sphere is approximately 1/1000 of the total number of hard memory locations. The function of the address decoder at each hard memory location is to compute the Hamming distance between the given read or write address and the address of the hard memory location, and to activate the location if the distance is less than or equal to r.
When a data word (a binary vector) is written to the memory at address x, the word is added to the counters at each of the activated hard memory locations (those within distance r of x) according to the following rule: If the value of the i.sup.th bit of the data word is 1, the i.sup.th counter is incremented; if the value of the bit is 0, the counter is decremented. Since each counter has finite capacity, it must have an upper and a lower limit. If a counter's limit has been reached, and the system then tries to add another bit that would take the counter beyond its limit, the counter simply remains at its current value. However, if adding the new bit will keep the counter within its range, the counter will be updated. Eight-bit counters, each one having a range of .+-.127, should be sufficient for many applications.
When a read operation is done at an address x, the, separately for each coordinate i, the values stored in the i.sup.th counters of all of the activated hard memory locations are sent to an accumulator and added. Each of these sums is then compared to a threshold value, and if a sum is greater than the threshold, a 1 is recorded for that coordinate. Otherwise, a 0 is recorded. These 1's and 0's form a M-dimensional binary vector which is the result of the read operation.
Kanerva (1984) showed that if a word is written at address x, and if a read operation is later done at address y near to x in Hamming distance, then many of the hard memory locations which had been activated by the write operation at x will also be activated by the read at y. Conversely, for a data word stored at an address more distant from y, few or none of the hard memory locations to which it was written will be activated by the read at y. As a result, the sums computed during the read operation (at address y near x) will contain many copies of the data word written at x, one copy for each of the hard memory locations activated by both x and y, along with "random noise" due to small numbers of copies of other data words written at more distant addresses. Consequently, if x is the only write address near y, then the vector obtained from the read operation at y will be close to the data word originally stored at x. Because of the random noise, some of the bits may not be recovered correctly; however, since the memory system is designed to work with approximate information, its goal in many applications will be achieved if it can recover most of the bits in the stored data.
Kanerva (1984) computed the expected number of hard memory locations activated by both a write at x and a read at y, as a function of N, r, and d(x,y). Since the region of activation for x is a sphere of radius r centered at x, and the region for y is a similar sphere about y, the hard memory locations activated by both x and y are those whose addresses fall in the intersection of the two spheres. This region will be called the "access overlap". Kanerva derived a formula for the volume of the intersection of two such spheres, that is, the number of points of S in the intersection. Since the hard memory locations are randomly distributed, the expected number of hard memory locations in the intersection is proportional to the volume of the intersection. Some representative values of this expected number are given in Table 1 below.
The performance of the Sparse Distributed Memory, assuming a given number of hard memory locations, may be judged by its ability to recover a stored word with some degree of accuracy when we read from the memory at an address near to the address at which the word was written, assuming a certain number of other words have been written to the memory at other addresses. Thus it is clear that for the system to perform well, the access overlap must be large if d(x,y) is small, and small if d(x,y) is large.
A limitation on the performance of Kanerva's design is imposed by the fact that if the read address is a moderately small distance from the write address of the stored date word that is to be recovered, there is a substantial decrease in the size of the access overlap. Consequently, it may be difficult to recover the data word if the address is not accurately known. It would be better to have an even greater access overlap for small d and a smaller access overlap for large d, thereby increasing the signal to noise ratio.
Another disadvantage of Kanerva's system is that computing the Hamming distance for each hard memory location involves summing a large number of bits, an operation that requires specially designed hardware if it is not to be very time-consuming.
The Computer Systems Laboratory at Stanford University has constructed a small-scale prototype of Kanerva's Sparse Distributed Memory, referred to below as the "Stanford prototype". It is described fully by Flynn et al. in "Sparse Distributed Memory Prototype: Principles of Operation", Technical Report CSL-TR-87-338, published February 1988 and in "Sparse Distributed Memory Prototype: Address Module Hardware Guide", Technical Report CSL-TR-88-373, published in December 1988. A brief description of its design is as follows.
The Stanford prototype uses 256-bit addresses and 256-bit data words. It has room for 8192 hard memory locations, with 256 eight-bit counters for each hard memory location. The addresses of the hard memory locations may be set by the user. The address decoding is done by a custom-designed address module. During a read or a write operation, it computes the 256-bit Hamming distance between the "reference address"--the read or write address--and the address of each hard memory location, one at a time and compares that distance to a given radius. There is a specially designed set of adders to compute the Hamming distance sum quickly. If the Hamming distance is less than or equal to the radius, which means that the hard memory location is to be activated, a 13-bit "tag" identifying the hard memory location is sent to the "tag cache", a buffer that holds the tags of the activated hard memory locations until the data in their counters can be processed.
The process of updating the counters for the activated hard memory locations during a write, or accumulating the data in those counters during a read, is done by the "stack module", which consists of 256 bytes of memory for each hard memory location, and a processor to do the additions. Since the stack module receives tags from the tag cache, it can begin working while the address module is continuing to determine which locations should be activated. If the tag cache becomes full (an unlikely event), the address module must pause until the stack module can catch up.
There is a control module that sends commands and data to the other modules, and there is also an executive module, which functions as a user interface.
The Stanford prototype is designed to perform a read or a write operation in about 1/50 of a second.