1. Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to an apparatus and method for achieving non-inclusive cache performance with inclusive caches.
2. Description of the Related Art
As chip-level multiprocessing (CMP) becomes widespread and the gap between processor and memory speeds continues to widen, it is imperative that processor architects design an efficient and high performing cache hierarchy. One of the key design choices for a multilevel cache hierarchy is whether or not to enforce inclusion. See, e.g., J. L. Baer and W. Wang. “On the Inclusion Properties for Multi-level Cache Hierarchies,” ISCA, 1988; M. Zahran. “Non-inclusion property in multi-level caches revisited,” IJCA'07, 2007; N. Jouppi and S. E. Wilton. “Tradeoffs in two-level on-chip caching,” ISCA, 1994. While inclusion greatly simplifies the cache coherence protocol (see, e.g., X. Chen, Y. Yang, G. Gopalakrishnan, and C. Chou. “Reducing verification complexity of a multicore coherence protocol using assume/guarantee,” FMCAD, 2006; J. L. Baer and W. Wang, “On the Inclusion Properties for Multi-level Cache Hierarchies,” ISCA, 1988) it limits performance when the size of the largest cache is not significantly larger than the sum of the smaller caches. In such scenarios, CPU architects resort to non-inclusive cache hierarchies (see, e.g., M. Zahran. “Non-inclusion property in multi-level caches revisited,” IJCA'07, 2007), or exclusive cache hierarchies (see, e.g., N. Jouppi and S. E. Wilton, “Tradeoffs in two-level on-chip caching,” ISCA, 1994).
The inclusion property requires that the contents of all the smaller caches of a multi-level cache hierarchy be a subset of the last-level cache (LLC). See, e.g., J. L. Baer and W. Wang, “On the Inclusion Properties for Multi-level Cache Hierarchies,” in ISCA, 1988. When a line is evicted from the LLC, inclusion is enforced by removing that line from all the caches in the hierarchy. Cache lines invalidated in the small caches as a result of inclusion are referred to herein as “inclusion victims.” The small caches, sometimes referred to herein as “core caches,” hide the temporal locality from the LLC when they service requests. Since replacement state is only updated to Most Recently Used (MRU) on cache hits, the LLC replacement state of “hot” lines constantly serviced by the core caches decays to least recently used (LRU) in the LLC. As a result, the “hot” lines become candidates for eviction in the LLC. The number of inclusion victims dramatically increases when multiple applications compete for the LLC or when the LLC is not significantly larger than the sum of all the core caches. A straightforward mechanism to eliminate inclusion victims would be to remove the requirement that core caches be a subset of the LLC. Such a “non-inclusive” cache (see, e.g., M. J. Mayfield, T. H. Nguyen, R. J. Reese, and M. T. Vaden, “Modified L1/L2 cache inclusion for aggressive prefetch,” U.S. Pat. No. 5,740,399), allows cache lines to reside in the core cache(s) without also being duplicated in the LLC. In doing so, non-inclusion increases the effective capacity of the cache hierarchy (see, e.g., M. Zahran, “Non-inclusion property in multi-level caches revisited,” IJCA'07, 2007; Y. Zheng, B. T. Davis, and M. Jordan, “Performance Evaluation of Exclusive Cache Hierarchies,” ISPASS, 2004. Unfortunately, non-inclusion eliminates the natural snoop filter benefit that an inclusive LLC provides, thus breaking the coherence benefits that come with inclusivity (see, e.g., J. L. Baer and W. Wang, “On the Inclusion Properties for Multi-level Cache Hierarchies,” ISCA, 1988. While snoop filters (see, e.g., A. Agarwal, R. Simoni, J. Hennessy, and M. Horowitz, “An evaluation of directory schemes for cache coherence,” ISCA, 1988; V. Salapura, M. Blumrich, and A. Gara, “Design and implementation of the Blue Gene/P snoop filter,” HPCA, 2008; R. Simoni, “Cache Coherence Directories for Scalable Multiprocessors,” PhD thesis, Stanford University, October 1992) can be used in addition to the LLC, such structures increase the hardware overhead (see, e.g., Y. Zheng, B. T. Davis, and M. Jordan, “Performance Evaluation of Exclusive Cache Hierarchies,” ISPASS, 2004) and verification complexity (see, e.g., X. Chen, Y. Yang, G. Gopalakrishnan, and C. Chou, “Reducing verification complexity of a multicore coherence protocol using assume/guarantee,” FMCAD, 2006). It would be beneficial to design a cache hierarchy that reduces (if not eliminates) the frequency of inclusion victims while providing the coherence benefits of inclusion. Thus, one goal discussed below is to bridge the performance gap between inclusion and non-inclusion by improving the management of an inclusive LLC.