Modern computer systems utilize various technologies and architectural features to achieve high performance operation. Innovative arrangements of system components can often result in significant improvements in the capabilities and processing power of the computer system.
Such high performance capabilities can be achieved in computer systems which employ several computer central processing units (i.e., CPUs or processors) arranged on modules in a multiprocessor system configuration. In addition to CPU modules, such a multiprocessor system can further include several I/O modules and memory modules, all coupled to one another by a system bus. The CPUs can be utilized to perform co-operative or parallel processing as well as multi-tasking among them for execution of several applications running simultaneously, to thereby achieve dramatically improved processing power. The capabilities of the overall system can be also enhanced by providing a cache memory at each one of the CPUs in the computer system.
A cache memory comprises a relatively small, yet relatively fast memory device arranged in close physical proximity to a processor. The utilization of cache memories is based upon the principle of locality. It has been found, for example, that when a processor accesses a location in memory, there is a high probability that the processor will continue to access memory locations surrounding the accessed location for at least a certain period of time. Thus, a preselected data block of a large, relatively slow access time memory, such as a main memory module coupled to the processor via a system bus, is fetched from main memory and stored in the relatively fast access cache memory. Accordingly, as long as the processor continues to access data from the cache memory, the overall speed of operation of the processor is maintained at a level significantly higher than would be possible if the processor had to arbitrate for control of the system bus and then perform a memory READ or WRITE operation, with the main memory module, for each data access.
The capabilities of the multiprocessor computer system can be further enhanced by sharing main memory among the CPUs and by operating the system bus in accordance with a SNOOPING bus protocol.
In shared memory multiprocessor systems, it is necessary that the system store a single, correct copy of data being processed by the various processors of the system. Thus, when a processor WRITES to a particular data item stored in its cache, that copy of the data item becomes the latest correct value for the data item. The corresponding data item stored in main memory, as well as copies of the data item stored in other caches in the system, becomes outdated or invalid.
In a write back cache scheme, where processor WRITEs are performed into a processor's cache, the data item in main memory is not updated until the processor requires the corresponding cache location to store another data item. Accordingly, the cached data item that has been modified by the processor WRITE remains the latest copy of the data item until the main memory is updated. It is, therefore, necessary to implement a scheme to monitor READ and WRITE transactions to make certain that the latest copy of a particular data item is properly identified whenever it is required for use by a processor.
The well known SNOOPING bus protocol provides such a scheme and, the necessary coherency between the various cache memories and the main memory of the computer system. In accordance with the SNOOPING bus protocol a system bus interface of each processor, or other component in the multiprocessor computer system, monitors the high performance system bus for bus activity involving addresses of data items that are currently stored in the processor's cache. Status bits are maintained in Tag stores associated with each cache to indicate the status of each data item currently stored in the cache.
One possible status bit associated with a particular data item is a VALID bit. The VALID bit identifies if the cache entry has a copy of a valid data item in it, i.e., the stored data item is coherent with the latest version of the data item, as may have been written by one of the processors of the computer system.
Another possible status bit associated with a particular data item is a SHARED bit. The SHARED bit identifies if more than one cache in the system contains a copy of the data item. A cache element will transition into this state if a different processor caches the same data item. That is, if when SNOOPING on the system bus, a first interface determines that another cache on the bus is allocating a location for a data item that is already stored in the cache associated with the first interface, the first interface notifies the other interface by asserting a SHARED signal on the system bus, signaling the second interface to allocate the location in the shared state. When this occurs the first interface will also update the state of its copy of the data item to indicate that it is now in the shared state.
Another possible status bit associated with a particular data item stored in a cache memory can be what is generally called a DIRTY bit. A cache entry is dirty if the data item held in that entry has been updated more recently than main memory. Thus, when a processor WRITES to a location in its cache, it sets the DIRTY bit to indicate that it is now the latest copy of the data item.
Also, in such a multiprocessor computer systems, for every command/address that some other processor module sends across the system bus, the present processor module would have to look up that address in its primary cache, find out if its in there and determine what action to take in response to the command/address.
To minimize this additional cache lookup activity, one or more Duplicate Tag (DTAG) stores are provided for each processor module. This DTAG approach allows for an identical copy of the primary cache memory Tag information. The Tag information in the primary cache is for use in conjunction with its processor. The Tag information in the DTAG cache is for use in conjunction with the system bus.
Therefore, as system bus commands come along the system bus, the present processor module would look up the command/address in its DTAG to find out if the address is there and determine what action to take in response to the command/address coming along the system bus.
Since there is a primary cache Tag store and a DTAG store, it is the goal of the system that each concurrently contain the same information. However, because of time delays in the system processes there may be a time delay between an update of the Status bit in the DTAG cache and the update of the Status bit in the primary cache. Therefore, the overall system protocol uses the DTAG cache lookup to determine the actual state of a cache entry. As such, the DTAG status becomes the overall system's "Point of Coherency".
In the above described system a processor can issue a WRITE command to SHARED blocks and to PRIVATE (non-SHARED) blocks. Such WRITEs are handled in varying manners depending on the nature of the Status bits.
For example, a block that is not VALID (or that misses in the cache) cannot be written. It must be first read.
It is read via a READ.sub.-- MISS.sub.-- MOD command which will leave the block in the VALID, PRIVATE and DIRTY state if the block is not found to be SHARED, or in the VALID, SHARED state if the block is found to be SHARED.
WRITEs to VALID, SHARED blocks result in a WRITE.sub.-- BLOCK command being issued to the system (updating memory, updating/invalidating other processors).
WRITEs to VALID, PRIVATE and DIRTY blocks result in WRITEs to the processor's cache, which do not update memory until the modified block is evicted or transitioned to VALID, SHARED, DIRTY and then written.
If a block is VALID, PRIVATE and CLEAN (non-DIRTY) it must be transitioned to the VALID, PRIVATE, DIRTY state or the VALID, SHARED state before it can be written.
When a processor encounters a STORE command to a VALID, PRIVATE, CLEAN block, the processor tries to transition the block to VALID, PRIVATE and DIRTY status by issuing a SET DIRTY command. The SET DIRTY command is issued to update a processor's DTAG (its Point of Coherency) to the new status before the processor's local cache status is updated. This is required to maintain cache coherency. If a READ/WRITE command from another processor to the block being written (causing the SET DIRTY command) has updated a processor's DTAG to SHARED or INVALID before the SET DIRTY command can change the block to VALID, PRIVATE, DIRTY, but has not yet updated the processor's local Tag status, the system must discontinue processing of the SET DIRTY command and allow the processor's local Tag status to be updated according to the system bus command.
If there are no intervening READs or WRITEs to the block being SET DIRTY, the system will update the DTAG status to VALID, PRIVATE and DIRTY and acknowledge the processor's SET DIRTY command.
While the above described cached, multi-processor computer system with cache memories and SNOOPING bus protocol using VALID, SHARED and DIRTY status bits represents a state-of-the-art model for a high performance computer system, the art has yet to achieve an optimal level of performance efficiency.
For example, typical means for processing a SET DIRTY command consists of comparing the SET DIRTY address against all outstanding commands/addresses that have been issued to the DTAG and not issued to the processor's primary cache Tag. The comparison involved with this typical handling of a SET DIRTY command require numerous clock cycles, even with an idle system bus. The use of such numerous clock cycles is wasteful, thereby impeding the overall performance of the computer system.
Therefore, a need exists for a method and apparatus for enhancing coherency regarding the setting of Duplicate Tag DIRTY status bits such that cache modification can occur in a minimal amount of cycles and thereby provide a significant gain in the overall performance of the computer system.