Modern computer systems utilize various technologies and architectural features to achieve high performance operation. Innovative arrangements of system components can often result in significant improvements in the capabilities and processing power of the computer system.
Such high performance capabilities can be achieved in computer systems which employ several computer central processing units (i.e., CPUs or processors) arranged on modules in a multiprocessor system configuration. In addition to CPU modules, such a multiprocessor system can further include several I/O modules and memory modules, all coupled to one another by a system bus. The CPUs can be utilized to perform co-operative or parallel processing as well as multi-tasking among them for execution of several applications running simultaneously, to thereby achieve dramatically improved processing power. The capabilities of the overall system can be also enhanced by providing a cache memory at each one of the CPUs in the computer system.
A cache memory comprises a relatively small, yet relatively fast memory device arranged in close physical proximity to a processor. The utilization of cache memories is based upon the principle of locality. It has been found, for example, that when a processor accesses a location in memory, there is a high probability that the processor will continue to access memory locations surrounding the accessed location for at least a certain period of time. Thus, a preselected data block of a large, relatively slow access time memory, such as a main memory module coupled to the processor via a system bus, is fetched from main memory and stored in the relatively fast access cache memory. Accordingly, as long as the processor continues to access data from the cache memory, the overall speed of operation of the processor is maintained at a level significantly higher than would be possible if the processor had to arbitrate for control of the system bus and then perform a memory READ or WRITE operation, with the main memory module, for each data access.
The capabilities of the multiprocessor computer system can be further enhanced by sharing main memory among the CPUs and by operating the system bus in accordance with a SNOOPING bus protocol.
In shared memory multiprocessor systems, it is necessary that the system store a single, correct copy of data being processed by the various processors of the system. Thus, when a processor WRITES to a particular data item stored in its cache, that copy of the data item becomes the latest correct value for the data item. The corresponding data item stored in main memory, as well as copies of the data item stored in other caches in the system, becomes outdated or invalid.
In a write back cache scheme, the data item in main memory is not updated until the processor requires the corresponding cache location to store another data item. Accordingly, the cached data item that has been modified by the processor WRITE remains the latest copy of the data item until the main memory is updated. It is, therefore, necessary to implement a scheme to monitor READ and WRITE transactions to make certain that the latest copy of a particular data item is properly identified whenever it is required for use by a processor.
The well known SNOOPING bus protocol provides such a scheme and the necessary coherency between the various cache memories and the main memory of the computer system. In accordance with the SNOOPING bus protocol a system bus interface of each processor, or other component in the multiprocessor computer system, monitors the high performance system bus for bus activity involving addresses of data items that are currently stored in the processor's cache. Status bits are maintained in TAG stores associated with each cache to indicate the status of each data item currently stored in the cache.
One possible status bit associated with a particular data item is a VALID bit. The VALID bit identifies if the cache entry has a copy of a valid data item in it, i.e., the stored data item is coherent with the latest version of the data item, as may have been written by one of the processors of the computer system.
Another possible status bit associated with a particular data item is a SHARED bit. The SHARED bit identifies if more than one cache in the system contains a copy of the data item. A cache element will transition into this state if a different processor caches the same data item. That is, if when SNOOPING on the system bus, a first interface determines that another cache on the bus is allocating a location for a data item that is already stored in the cache associated with the first interface, the first interface notifies the other interface by asserting a SHARED signal on the system bus, signaling the second interface to allocate the location in the shared state. When this occurs the first interface will also update the state of it's copy of the data item to indicate that it is now in the shared state.
Another possible status bit associated with a particular data item stored in a cache memory can be what is generally called a DIRTY bit. A cache entry is dirty if the data item held in that entry has been updated more recently than main memory. Thus, when a processor WRITES to a location in its cache, it sets the DIRTY bit to indicate that it is now the latest copy of the data item.
While the above described cached, multi-processor computer system with cache memories and SNOOPING bus protocol using VALID, SHARED and DIRTY status bits represents a state-of-the-art model for a high performance computer system, the art has yet to achieve an optimal level of performance efficiency.
For example, a particular cache location address may have a VALID but DIRTY block of data (i.e., the data has changed at that cache address location). If a processor wants to READ another block of data back from main memory and it desires to map the new READ data into the same cache location which has the DIRTY block of data, it is necessary to get the DIRTY block of data out of the cache location and back to main memory before it is overwritten with the new block of data being brought back from main memory. The DIRTY block of data to be sent to main memory before it is overwritten is generally know as a VICTIM.
Such a VICTIM is typically handled in the following manner when the processor executes a LOAD command, the address of the data desired is sent to the cache. The cache returns to the processor status information which allows the processor to determine that the block at that cache address location is not the actual block that the processor is now desirous of being READ and that the block resident in that cache location is DIRTY. As such, the block at that location will become a VICTIM and needs to be moved out. To do so, the processor sends out a READ.sub.-- MISS command to its system interface, indicating that the block it wishes to read is not in the cache and must be fetched from main memory. Following the READ.sub.-- MISS command, the processor sends a VICTIM command to the system interface, indicating that there is a cache VICTIM associated with the prior READ.sub.-- MISS. As this VICTIM command is issued, the cache is again accessed so that the system data interface may copy the VICTIM data into a buffer. When the new block of data associated with the READ.sub.-- MISS is returned from main memory it overwrites the VICTIM data in the cache. Later, after a VICTIM writeback command has been issued to the system bus, the VICTIM data is written from the data interface buffer into main memory.
A problem with the above described prior art method of handling cache victims for updating main memory is that these operations result in multiple cache accesses over a significant amount of system clock cycles, and ties up the cache for other activity (e.g. precluding the processor from issuing a second cache lookup), thereby impeding the overall performance of the computer system.
Therefore, a need exists for a method and apparatus for cache victim handling so that victim data can be provided to main memory in a minimal amount of cycles and therefore provide a significant gain in the performance of the computer system.