Chip multiprocessors (CMPs) that include multiple processing cores on a single die can improve system performance. Such CMPs and other multiprocessor systems are often used for highly-threaded (or parallel) applications and to support throughput computing. To support high performance throughput computing, an on-die cache/memory hierarchy should support many cores/threads efficiently. In a multiple-core platform, cache space available per hardware thread is not growing at near the same rate as the compute density due to die area and cost constraints. Further, a large number of cores, e.g., in-order cores, results in increased memory pressure. Cache hierarchies allow for faster access latencies to the most currently used data, but also introduce the possibility of redundant information, thereby wasting cache space. While a CMP architecture enables usage of multiple levels of shared caches, traditional policies such as inclusive caches and central directories are not satisfactory.
There are typically three inclusion policies for a cache hierarchy: inclusive, non-inclusive, and exclusive. Inclusive caches cause redundant information to be stored across the cache hierarchy, which leads to inefficient space usage. Non-inclusive caches do not have to enforce inclusion, however, such policies send snoop traffic to lower-level caches even when the line does not exist in a higher-level cache (note that lower caches are close to cores and higher caches are close to main memory). In an exclusive cache hierarchy, data is present in only a single cache. While efficient in usage of space, such a policy increases coherency messages and causes data to be moved constantly between multiple levels of caches.