Processor power dissipation has become an issue for processors of all types, from low end mobile processers to high end server processors. Among processor components, a cache memory is a major portion of a processor's area and transistor counts, and consumes significant leakage power. For example, for a typical commercially available multicore processor, 40% of total leakage power is due to a last level cache (LLC) and interconnect.
While reducing a cache's leakage power by turning off portions of a cache memory may reduce processor power consumption, it is practically difficult to turn off even portions of a LLC as it is typically implemented as a shared memory structure in which a portion of all memory addresses of a system is statically mapped to each LLC portion. As such, even if one core of a multicore processor is operating, all LLC slices are active to service memory requests mapped to the slices. And thus there are limited power saving opportunities for this type of cache memory in many current processors.
A clustered (non-uniform) LLC organization may provide power management and performance benefits. In a clustered LLC, each LLC cluster only holds data for its associated core(s). As such, hit latency and interconnect traffic are reduced. Moreover, when the cores associated with a cluster enter a low power state, the LLC cluster does not have to service any active cores associated with other clusters. And thus the cluster associated with the low power core(s) can be potentially flushed and turned off to save significant leakage power. But inefficiencies still exist due to sub-optimal scheduling.