Chip multiprocessors (CMPs) that include multiple processing cores on a single die can improve system performance. Such CMPs and other multiprocessor systems may be used for highly-threaded (or parallel) applications and to support throughput computing. To support high performance throughput computing, an on-die cache/memory hierarchy should support many cores/threads efficiently. In a multiple-core platform, cache space available per hardware thread is not growing at near the same rate as the compute density due to die area and cost constraints. Further, a large number of cores, e.g., in-order cores, results in increased memory pressure. Cache hierarchies allow for faster access latencies to the most currently used data, but also introduce the possibility of redundant information, thereby wasting cache space. While a CMP architecture enables usage of multiple levels of shared caches, traditional policies such as inclusive caches and central directories are not satisfactory.
There are typically three inclusion policies for a cache hierarchy: inclusive, non-inclusive and exclusive. Inclusive caches may cause redundant information to be stored across the cache hierarchy, but avoid snooping of lower-level caches when misses occur in higher-level caches (note that lower caches are close to cores and higher caches are close to main memory). However, if a line is being replaced in a higher-level cache, back-invalidation messages are sent to evict corresponding lines from lower-level caches. These back-invalidation messages increase traffic, consuming bandwidth. In addition, a least recently used line in a last-level cache (LLC) or other higher-level cache that is being replaced will cause the eviction of the same line from the lower-level caches, even if that line is most recently used in the lower -level caches. Thus, invalidating this line may cause a higher miss rate. Non-inclusive caches may avoid back-invalidation messages because they do not have to enforce inclusion but send snoops to lower-level caches even when the line does not exist in the higher-level caches. While an exclusive cache hierarchy may avoid both back-invalidation messages and redundant information, such arrangements can send snoops all the way to the LLC and also consume higher communication bandwidth because data is moved constantly between multiple levels of caches.