This invention relates generally to computer systems and more specifically to coherency protocols and inclusion in cache memory systems.
Most computer systems employ a multilevel hierarchy of memory systems, with relatively fast, expensive, limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost, higher-capacity memory at the lowest level of the hierarchy. Typically, the hierarchy includes a small fast memory called a cache, either physically integrated within a processor integrated circuit or mounted physically close to the processor for speed. There may be separate instruction caches and data caches. There may be multiple levels of caches.
The minimum amount of memory that can be transferred between a cache and a next lower level of the memory hierarchy is called a line, or block, or page. The present patent document uses the term xe2x80x9cline,xe2x80x9d but the invention is equally applicable to systems employing blocks or pages.
In most multilevel caches, each cache level has a copy of every line of memory residing in every cache level higher in the hierarchy, a property called inclusion. For example, in an inclusive two-level cache system, every entry in the primary cache is also in the secondary cache. Typically, when a line is evicted from an upper level cache, the line is permitted to remain in lower level caches. Conversely, in order to maintain coherency, if a line is evicted from a lower level cache, the lower level cache must issue a bus transaction, called a back-invalidate transaction, to flush any copies of the evicted line out of upper levels of the cache hierarchy. Back-invalidate transactions occur frequently and have a significant impact on overall performance, because of increased bus utilization between the caches and increased bus monitoring (snoop) traffic. There is a need for reducing the number of back-invalidate transactions in order to improve performance.
Many computer systems employ multiple processors, each of which may have multiple levels of caches. All processors and caches may share a common main memory. A particular line may simultaneously exist in shared memory and in the cache hierarchies for multiple processors. All copies of a line in the caches must be identical, a property called coherency. The copy of a line in shared memory may be xe2x80x9cstalexe2x80x9d (not updated). If any processor changes the contents of a line, only the one changed copy is then valid, and all other copies must be then be updated or invalidated. The protocols for maintaining coherence for multiple processors are called cache-coherence protocols. In some protocols, the status of a line of physical memory is kept in one location, called the directory. In other protocols, every cache that has a copy of a line of physical memory also has a copy of the sharing status of the line. When no centralized state is kept, all cache monitors monitor or xe2x80x9csnoopxe2x80x9d a shared bus to determine whether or not they have a copy of a line that is requested on the bus. The present patent document is relevant to any multi-level cache system, but is particularly relevant to multi-processor systems, with each processor having a hierarchy of caches, all sharing a main memory, in a snooping based system.
FIG. 1 illustrates a state diagram for an example prior-art multi-processor cache-coherency protocol in a snooping based system. FIG. 1 illustrates four possible states for each line in a cache. Before any lines are placed into the cache, all entries are at a default state called xe2x80x9cinvalidxe2x80x9d (100). When an uncached physical line is placed into the cache, the state of the entry in the cache is changed from invalid to xe2x80x9cexclusivexe2x80x9d (102). The word xe2x80x9cexclusivexe2x80x9d means that exactly one cache hierarchy has a copy of the line. If a line is in an exclusive state in a cache hierarchy for a first processor, and if a second processor requests the same line, the line will then be copied into two cache hierarchies, and the state of the entry in each cache is set to xe2x80x9csharedxe2x80x9d (104). If a line is modified in a cache, it may also be immediately modified in shared memory (called write through). Alternatively, a cache may write a modified line to shared memory only when the modified line in the cache is invalidated or replaced (called write back). FIG. 1 assumes that the cache is a write-back cache, and accordingly when a line in the cache is modified, the state of the entry in the cache is changed to xe2x80x9cmodifiedxe2x80x9d (106). The protocol of FIG. 1 is sometimes called a MESI protocol, referring to the first letter of each of the four states.
In the protocol of FIG. 1, the modified state (106) is effectively an exclusive modified state, meaning that only one cache hierarchy in the system has a copy of the modified line. Some systems add an additional modified state to enable multiple caches to hold a copy of modified data. FIG. 2 illustrates a prior art protocol in which an additional state has been added, called xe2x80x9cownedxe2x80x9d (208). States 200, 202, and 206 in FIG. 2 have the same function as the identically named states for FIG. 1. In contrast, in the protocol of FIG. 2, other cache hierarchies may be holding copies of a modified line in the shared state (204), but only one cache hierarchy can hold a modified line in an owned state (208). Only the one cache holding a modified line in the owned state can write the modified line back to shared memory.
New additional cache coherency states are provided that indicate that a line is not cached in higher levels of the cache hierarchy, and therefore no back-invalidate transaction is required when the line is evicted. One new state is called Mu, for modified and uncached, meaning that the modified line is uncached in a higher level of the cache hierarchy. Similarly, Su (shared and uncached) and Eu (exclusive and uncached) states may be provided.
For the Mu state, the system snoops evictions from upper levels of the cache hierarchy. More specifically, any write-back from an upper level cache is written through the lower level cache. On receiving a write-back from an upper level, the lower level cache changes the state of the relevant entry to Mu. If a line having a state of Mu is evicted, no back-invalidate transaction is generated. If a line in the Mu state is subsequently read, the state is switched to M (modified).
Su and Eu states may be provided whenever the system provides a hint that a line is not cached in higher levels of the cache hierarchy. For example, if a system provides a transaction to inform a lower level cache or directory when a clean line is displaced from an upper level cache, then the Su and Eu states may be provided.