Field of the Disclosure
The present disclosure relates generally to multiple-core processing systems and, more particularly, to caching in multiple-core processing systems.
Description of the Related Art
Larger caches, such as last-level caches, typically are implemented as a collection of several smaller, separate cache “slices.” Each slice has a corresponding set of cache lines and access circuitry for accessing the set of cache lines. In conventional processing systems, the cache may be set up as either an address-interleaved cache or a per-core cache. For a conventional address-interleaved cache, each memory address of an address space associated with the cache is mapped to only a single cache slice. This approach has the benefit of reducing or eliminating the overhead involved in maintaining coherence within the cache as only one cache slice can contain a valid copy of the data associated with a given memory address. However, this approach also can increase cache latency because cache accesses initiated by a processor core may need to be routed to a physically distant slice, and this latency can significantly impact the performance of the processor core.
Conversely, in a per-core cache, each cache slice is assigned to only one corresponding processor core, or to only a corresponding small cluster of cores, and thus operates to maintain the cached data for the corresponding processor core or core cluster. In effect, each slice operates as a private cache for a single processor core or small cluster of processor cores. This results in reduced cache access latency as there is minimal communication distance between the processor core or cluster and the corresponding cache slice. However, the trade-off is that the coherency mechanism for maintaining coherency within the cache is considerably more complex as all slices associated with the same address space must maintain coherence, and thus numerous coherency transactions (invalidations, for example) will be transmitted for each cache access to a local cache slice that impacts the coherency of the other cache slices. Moreover, the total effective storage capacity of the cache is diminished in per-core cache configurations because the same data is redundantly stored multiple times across different slices.