The present invention relates generally to memory systems for use in computers and, more specifically, to content addressable memory systems.
Modern computers are remarkably similar to the original Von Neumann designs. The computer is typically divided into a memory system for storing data and instructions and a central processing unit which is responsible for carrying out the instruction in question using the data stored in the memory. The memory is typically organized into storage slots each having a fixed number of bits. Data is stored in a selected storage slot by specifying the data bits and the location of the selected storage slot with reference to the location of the first storage slot in the memory. The location of a storage slot specified in this manner is normally referred to as the address of the storage slot. For the purposes of this discussion, the contents of one such storage slot will be referred to as a character.
As a result of the address method of specifying data stored in the memory, a typical central processing unit has a maximum efficient size for directly addressable memory. The instructions executed by the central processing unit include information specifying the addresses of the data to be used with those instructions. A central processing unit designed to address a large memory must allocate more bits for the address portion of each instruction than a central processing unit designed to address a small memory, or it must use some form of indirect addressing which introduces "overhead" calculations which reduce efficiency. That is, the number of bits of information that must be stored for each instruction is greater in a large memory system computer than in a small memory system computer. This requires additional memory space to store these instructions and additional hardware in the central processing unit to process the larger instructions. Hence, the efficiency of a Von Neumann system will, in general, decrease with increasing memory size beyond the above mentioned maximum efficient memory size. The additional hardware needed for calculating extended addresses does not perform any additional programmed calculations when a given problem is being processed by the computer. Unfortunately, artifical intelligence applications of computers often require very large memory systems; hence, Von Neumann designs are not well suited to such applications.
There is also a limit to the speed at which a Von Neumann computer can run. In spite of the enormous improvements made in the speed of such computers, there is still a large class of problems, particularly in the artifical intelligence area, which require more calculations per unit time than can be carried out on even the fastest Von Neumann computer. This limitation results from the inability of the central processing unit to process more than one instruction at a time.
As a result of these limitations, a number of authors have suggested various forms of concurrent computer architectures to replace, or at least augment, the classical Von Neumann design. In a concurrent computer, the computer program is broken into a number of tasks which are given to each of a plurality of different central processing units to carry out. The various central processing units often have their own memory systems. The central processing units run simultaneously. In a system with N central processing units, the potential throughput of the system is N times that of a single central processing unit. Hence, to the extent that this potential can be realized, a concurrent computer can avoid the speed limitations of the classical Von Neumann design.
The extent to which the theoretical increase in speed of a given concurrent computer may be realized, however, depends upon the extent to which the programmer can break the problem to be solved by the computer into sub-tasks which may be run simultaneously. The types of programs which lend themselves best to this approach are those which can be broken into a number of identical tasks performed on different sets of data. Ideally, the tasks should be capable of being performed without the need to have the results from one of the tasks before proceeding to the next task. If this is not the case, one or more of the central processing units may be idled waiting for the results in question.
It should be noted, however, that even when a substantial increase in speed is obtained by this approach, there may actually be a decrease in the efficiency of the computer as measured in terms of the cost per calculation. At best, the efficiency of the multi-processor system is the same as that of the individual central processing units. In general, it will be less, since additional overhead calculations must be performed to coordinate the actions of the various central processing units. Furthermore, it is difficult to keep all of the central processing units busy all of the time; hence, inefficiencies resulting from idle central processing units are also present.
To obtain a substantial improvement in system efficiency, a central processing unit having a substantially reduced instruction set must, in general, be used. The efficiency of a central processing unit is related to the area of silicon needed to fabricate it. Larger area central processing units are more expensive to fabricate than smaller central processing units. Hence, to reduce the cost of the central processing unit, either the silicon area must be reduced or the utilization of the silicon area must be improved. Since the number of different instructions which a given central processing unit can execute is related to the silicon area, a reduction in area requires a reduction in the number of different instructions executable by the central processing unit. Hence, unless the efficiency of utilization of silicon can be increased, an increase in central processor efficiency requires a decrease in the size of the instruction set.
Hence, to obtain both a substantial improvement in speed and cost per computation using a concurrent computer architecture, the problems which this architecture is designed to solve must be divisible into sub-tasks which can each be executed concurrently by a plurality of central processing units. And, the central processors must each be optimized so as to be able to execute only those instructions that are necessary to solve the problem. Such reduced instruction set central processing units will be referred to hereinafter as processing units. For reasons of practicality, such processing units are usually all identical in a given concurrent computer.
One type of problem which satisfies these constraints is that of finding an entry in a table of similar entries. For the purpose of this discussion, each entry will be assumed to be a word in the English language. Each entry in the table consists of a word followed by data specifying where additional information about that word may be found. This data will be referred to as a pointer, since, in general, the data in question specifies an address in the computer's memory at which a contiguous block of characters starts which specifies the additional information. In the general table look-up problem, one wishes to find one or more entries in the table having words which satisfy a specification, referred to as a specification word, and then to return the table entries found, including the word and pointer for each such entry. This specification is referred to as a specification word, because it usually consists of a sequence of characters which are to match a corresponding sequence of characters, referred to as a field, in the word stored in each table entry. For example, a request could be made for all entries in which the word begins with the sequence of characters "nat".
This type of table look-up task occurs frequently in computer programs which must deal with English or some other human readable language. For example, most computers accept commands consisting of English words. A command, typically, begins with a verb which specifies that a particular program stored in the computer's memory is to be run. Each time the computer receives a word which may be a verb of this type, the program must compare the received word with each of the words in a command word table. If the received word matches one of these command words, control is transferred to the instruction located at a memory location specified in the pointer associated with the command word.
Similarly, text processing programs are often required to check the spelling of the words in a document. This can be accomplished by comparing each word in the document to a list of words, referred to as a dictionary. If the word is found in the dictionary, it is correctly spelled. The remaining words not found in the dictionary are then collected in a list for an operator to review to determine if they are misspelled.
Several authors have proposed concurrent computer architectures for carrying out this type of table look-up. These computers are often referred to as content addressable memories, since they are peripheral devices connected to a host data processing system which retrieve data records based on the content of the data record rather than the address at which a data word is located in the computer's memory.
An ideal content addressable memory has three properties. First, it should retrieve the first desired character of the first data word in a time approximately equal to that required to transfer the specification word to the content addressable memory. In general, the host data processing system can only transfer words one character at a time. Hence, the minimum time needed to find all data words stored in the content addressable memory which match a particular specification word is the time needed to transfer the specification word in question. No substantial improvements in the efficiency of the overall system can be achieved by constructing a content addressable memory which responds faster than the host data processing system can use the results.
Second, the efficiency of the content addressable memory and host data processing system should not be dependent on the size of the content addressable memory as measured by the number of data words stored therein. This property is often referred to as the ability to scale the content addressable memory. If the size of the content addressable memory were, for example, doubled, no increase in the time needed to select and retrieve a data word having a field which matches the specification word should be encountered. Similarly, the host data processing system should not have to be reprogrammed if the size of the content addressable memory is increased. Ideally, the host data processing system should not need to know the size of the content addressable memory.
Finally, the amount of circuitry in the content addressable memory that is devoted to matching the data words against the specification word, i.e., the amount of circuitry in the content addressable memory processing units, should be small compared to the amount of circuitry devoted to storing data words. Since the data words in question must be stored somewhere, the minimum size of the content addressable memory would be the space needed to store the data words. Once the space needed to construct the processing units is made small compared to this space, no substantial improvements in overall system efficiency, as measured by cost per computation, can be made.
Prior art content addressable memories have failed to meet these ideals. The circuitry used to implement the processing units in prior art designs has been a substantial fraction of the circuitry needed to store a single data word. As a result, the third goal could only be met by multiplexing this processing unit circuitry between several data words. This strategy results in a content addressable memory in which the time required to retrieve the first data word having a specified field which matches the specification word is much longer than the time needed to transfer the specification word to the content addressable memory.
The time needed to compare one character of the specification word with the corresponding character in one data word is approximately the same as the time needed to transfer the specification word character in question to the content addressable memory. Hence, if each processing unit must service N data words, the time needed to compare one character of the specification word with each corresponding character of the stored data words will be N times the time needed to transfer the specification word character to the content addressable memory. Hence, prior art systems have been a compromise in which the first and third ideals mentioned above are traded off against one another.
Because of the limitations of current VLSI circuit fabrication techniques, many useful applications for a content addressable memory require so large a memory that the content addressable memory must be constructed on more than one circuit chip. During operations in which a data word is either read out of, or written into, the content addressable memory, only one such chip can be operative. Typically, the active chip in prior art content addressable memories has been specified by the host data processing system using chip select lines. That is, each chip has an input pin which enables it for operation when an appropriate signal is applied thereto. When, for example, a read operation is performed, the host data processing system selects one of the chips to be active and de-activates the remaining chips by an appropriate signal on the chip select lines of these chips. This requires that the host data processing system be able to "address" the various chips.
This addressing operation has all of the limitations and problem associated with a central processing unit addressing its memory. It requires that a fixed number of address lines be run between the central processing unit and the memory and that the central processing unit execute additional software to scan through the content addressable memory by sequentially selecting the appropriate address line. If one expands the content addressable memory beyond the capacity of the address lines originally designed into the system, additional hardware and software must be introduced into the system which usually results in a decrease in operating efficiency, since the additional software increases the "computational overhead" of the system. That is, the additional content addressable memory chips require the central processing unit to perform additional calculations which do not produce any useful output.
Broadly, it is an object of the present invention to provide an improved content addressable memory system.
It is also an object of the present invention to provide a content addressable memory which can retrieve each data word having a specified field which matches the specification word in a time substantially equal to the time needed to transfer the specification word to the content addressable memory.
It is another object of the present invention to provide a content addressable memory in which the amount of circuitry devoted to processing units is small compared to the total circuitry in the content addressable memory.
It is yet another object of the present invention to provide a content addressable memory which can be increased in size indefinitely without changing either the host data processing system hardware or software used to operate said content addressable memory.
These and other objects of the present invention will become apparent from the following detailed description of the present invention and the accompanying drawings.