The architecture of a typical, single-processor computing system can be viewed as some variation of the von Neumann model of computation. According to this model, instructions and data are stored in the same memory, and the processor fetches instructions one by one from the memory, executing operations on data as specified in the instructions. As the speed of processors has increased, there has been a need to find ways to more suitably match the access time of the main computer memory to the computational speed of the processor. One known way of accomplishing this is through the use of cache memory which typically has a much faster access time than main memory, but can also be many times more expensive than main memory.
A cache memory contains some subset of the information stored in main memory, and resides between the processing unit and the system bus, which provides the data path between a processor and main memory. When a processor attempts to access a main memory location that is copied in its cache (a cache "hit"), no access to main memory is required to provide the requested value to the CPU, and the processor can resume operation more quickly. On the other hand, when the processor attempts to access a main memory location that is not copied in the cache (a cache "miss"), a main memory access must occur. In this event, the read data is sent to both the processor and to the cache, so that some subsequent attempts to access that memory location will result in cache hits. In this way, the effective memory access time for the processor is reduced to a value somewhere between the fast access time of the cache memory and the slower access time of main memory. Since the cache memory is usually smaller than main memory by an order of magnitude or more, the computer subsystem which controls the cache memory must employ methods of determining which memory location is to correspond to which cache location (the mapping function), and which cache location should be overwritten in the case that a new memory location is to be written into an already full cache (the cache replacement algorithm). Judicious selection of these configuration options can result in a cache "hit ratio" (the percentage of memory access requests that result in cache hits) of 90 to 99 percent.
Along with the increase in system efficiency resulting from the use of each memory, however, comes the problem of data coherence. That is, there must be assurance that a cache location holds the same value as the main memory location to which it corresponds. One way to maintain data coherence is to write modified values of data contained in the cache memory both to the cache memory and to the corresponding main memory location, each time memory write access to that location is requested. This method is called a "write-through" policy. Another cache coherence technique involves a "write-back" policy, in which a modified data value is not written to the slower main memory until the corresponding cache location must be overwritten. The trade off between these policies involves the requirement of greater bandwidth at the memory subsystem level in updating main memory for each write access in a write-through policy versus the increased complexity in cache coherence in a write-back policy. In systems with sufficient bandwidth, a write-through policy is often preferred due to its simplicity.
Recent decreases in the cost of processing units have facilitated the advent of a more radical departure from the von Neumann machine organization, in which a plurality of processors operate concurrently with each other, while still accessing a common main memory space via a common system bus. Each processor can have its own private cache which resides between the processor and the system bus. For such multi-processor systems, the use of cache memories is more crucial to system performance than in single processor systems, since each of the processors is in contention with the others for use of the common system bus in order to access the shared memory. The problem of data coherence is likewise more pronounced, since the value stored in a single main memory location might at one time be replicated in the private cache memory of any or all of the processors. If the local cache memories each employ a write-back policy, the system must somehow ensure that when one processor modifies the value of a memory location and writes that modification in to its cache memory, the copies of that memory location in any of the other local caches reflects the change made by that one processor.
The present invention is directed to a multi-processor computer system comprising a plurality of CPU modules which share a common memory space via a time-shared system bus, along with one or more I/O modules. The common memory space can be realized as a plurality of memory modules each containing part of the shared system memory. A CPU module includes a processor on which instructions are executed, a private cache memory unit and possibly additional supporting hardware for efficient control of the CPU module and syncronization of the CPU module with other components of the system. An I/O module interfaces the system bus to an I/O bus to enable transfers to and from input/output devices like disk drives, tape drives, display devices, printers, or modems.
As is common in the art of multi-processor systems, any of the modules interfaced to the system bus can initiate one of four kinds of transactions on the bus: null, read, write and read data transactions. The time during which a single one of these transactions is taking place on the bus is called a bus cycle. A null transaction occurs when no module requires the bus, and is ignored by all modules. A read transaction is one in which a CPU or I/O module sends a request to a memory module to return memory data. A write transaction is one in which a CPU or I/O module sends a request to a memory module to write new memory data. A read data transaction is one in which a memory module returns data to a CPU or I/O module in response to a previous read transaction. Contention for use of system bus among the various modules is arbitrated in some manner specific to the system bus implementation, and known in the art of arbitration protocols.
As part of the support hardware associated with a CPU module, known techniques in the art suggest that a structure called a Read Data Queue may be introduced between the system bus and the CPU module. This structure holds data values that have been returned from a memory module in response to read transactions. The queuing of read data enhances the performance of the system by allowing a processor to accomplish other tasks while main memory access is made, instead of waiting idly for the data to be returned. The Read Data Queue is a first-in-first-out (FIFO) queue containing multiple entries, each of which includes a data field and a valid bit. As used herein, it is understood that when a valid bit is set, it indicates that valid data is resident in that entry, i.e., that that entry is "full". If the valid bit for that entry is not set, that entry is "empty", i.e., contains no data. When the CPU module receives data from main memory via a read data transaction, that data is placed on one end of the Read Data Queue, and the valid bit is set for that entry. When the CPU is ready to accept incoming data to put in its cache memory, the first valid entry is removed from the other end of the queue, and the valid bit is cleared.
Another FIFO structure called an Invalidate Queue may also be introduced between the system bus and the CPU module. The Invalidate Queue also contains multiple entries called "invalidates", each including at least an address field and a valid bit. The CPU monitors the system bus for coherence transactions. In a system employing a write-through policy the CPU module monitors the system bus for write transactions. When any data write transaction is detected on the system bus, the address of that transaction is placed on one end of the CPU module's Invalidate Queue, and the valid bit is set, indicating that that entry is full. When the CPU is able to process an invalidate, the first valid entry is removed from the other end of the Invalidate Queue, and its valid bit is cleared. The address of the write transaction is checked against the contents of the cache, and if present, the entry corresponding to that address is marked as invalid (empty). In this way, the CPU can be prevented from using data values which are outdated.
Cache coherency in multi-processor systems is maintained when each cache memory processes transactions in the same order as they occurred on the system bus. The order of invalidates as they appeared on the system bus can be preserved by the FIFO queue that holds them. Similarly, the order of read data transactions can be preserved in their FIFO queue. Unfortunately, however, the order of invalidates in relation to read data transaction, or equivalently, the order of read data transactions relative to invalidates, as they appeared on the system bus, is not preserved by the use of separate Read Data and Invalidate queues.
All forms of this technique of serialization do not adequately solve the coherency problem. For example, one technique is to wait for the invalidate queue to become empty before transmitting any read data to the cache memory. This is not a sufficient solution to the problem of cache coherency in multi-processor systems, however, since it is possible to construct a worst-case traffic pattern in which new writes on the system bus are added to the end of the Invalidate Queue as fast as they are processed as invalidates by the cache memory. In such a situation, the read data would never be returned to the cache memory because the Invalidate Queue would never empty, The method of the present invention is not vulnerable to such pathological behavior, since it inhibits the transmission of read data to the cache only until a finite, predetermined number of invalidates have been serviced.