1. Field of the Invention
The present invention generally relates to microprocessor architectures, and more particularly, the present invention relates to a pipelined snoop bus for maintaining coherence among caches in a multiprocessor configuration.
2. Description of the Related Art
In multiprocessor systems, processor cache memories often maintain multiple copies of a same data object. When one processor alters one copy of the data object, it is necessary to somehow update or invalidate all other copies of the object which may appear elsewhere in the multiprocessor system. Thus, to insure coherence among multiple copies, every valid write to one copy of an object must update or invalidate every other copy of the object.
Consider, for example, the conventional multiprocessor configuration illustrated in FIG. 1. To the left of the vertical dashed line sits the CPU chip 102, and located to the right of the dashed line are external (EXT.) components 104. Reference numeral 106 denotes an external cache (e-cache) which is visible to all processors and which interfaces with a main memory (not shown). Access to and from the main memory can only occur through the e-cache 106.
The CPU 102 contains multiple processors which share the main memory via the common memory bus (not shown) and the e-cache 106. When one processor is granted exclusive use of a data object, the object is placed in the external cache 106 and used on the CPU chip 102 until it is taken away or evicted from the e-cache 106. Illustrated within the CPU 102 are the on-board caches 108 and 110 associated with one processor. Cache 108 is a data cache (d-cache) for storing data as it is passed back and forth from the execution units of the processor, and cache 110 is an instruction cache (i-cache) holding instructions prior to execution by the processor's execution units.
Reference numeral 112 denotes an interface unit. When a processor desires exclusive use of an object from main memory, the corresponding interface unit issues a snoop request. Snooping protocols are generally designed so that all memory access requests are observed by each cache. In the event of a coherent write, each cache is responsive to the snoop request to scan its directory to identify any copies of the object which may require invalidation or updating. However, to avoid searching every cache directory upon the occurrence of every coherent write, the conventional systems adopt an "inclusive" approach to the cache coherencies.
The basic principal underlying cache coherency schemes is that when one processor is granted exclusive use of a data object, all other processors invalidate that data in their own memories. In the conventional inclusive cache coherency structure, the e-cache includes data existing in all the other caches on the chip. That is to say, any data that exists on the on-board caches of the chip must exist in the e-cache as well. If a data object gets evicted out of the e-cache or snooped out of the e-cache, it is removed from all the on-chip caches.
As such, referring to the flowchart of FIG. 2, when a snoop comes in from some other processor (step 202), the system interface unit 112 looks to the e-cache first to scan its contents (step 204), and if the data object is not there (NO at step 206), snooping is complete since the data object cannot exist on the on-board caches of the chip. Again, this is because every time something is evicted from the e-cache, it is invalidated on each of the on-board caches. If the data is found in the e-cache (YES at step 206), then the interface unit 112 sends out a signal to invalidate the data as it exists on the on-board caches.
Since snoop processing is complete when the data is not found in the e-cache, the conventional technique of looking first to the e-cache for the data has the effect of filtering the snoop requests applied to the on-board cache memories of the processors. This in turn reduces the average bandwidth of the on-board snoop processing.
However, the conventional scheme does suffer drawbacks. For example, each time a data object is evicted from the e-cache, it must be invalidated on each of the on-board memories to preserve the inclusiveness of the configuration. If the e-cache is a large direct-mapped cache, and something is evicted, it must be evicted (invalidated) in all the lower level caches as well, even if not necessary. This often results in inefficiencies, since the e-cache might have collisions which are not present in the on-board caches. This ultimately results in a reduction in the cache hit rate.
Further, it is always possible for a number of snoop requests to hit the e-cache in a row which require invalidates in the on-board memories, and thus, the chip must support this "peak" bandwidth. Thus, the filtering is of limited value since over any given stretch of time, it may be necessary to carry out on-board snoop processing at full bandwidth.