1. Technical Field
Generally, the disclosed embodiments relate to integrated circuits, and, more particularly, to power management of a compute unit comprising a cache.
2. Description of the Related Art
A computer system comprising a compute unit (e.g., a core of a multi-core central processing unit (CPU)) can place the compute unit into a lower power state when it is not needed to perform user- or system-requested operations. Placing an unneeded compute unit into a lower power state may reduce power consumption and heat generation by the computer system, thereby reducing operating expenses of the computer system and extending the service life of the computer system or components thereof. It is common for a computer system to contain a central power management unit (PMU) to orchestrate the low power transitions for compute unit(s) and/or other components within the system. Typically, the PMU can make requests directly to a compute unit to power down and power up.
A compute unit may have a cache. Typically, a cache is used to store copies of data from frequently used locations of main memory. It is generally quicker for the compute unit to access data in the cache than the corresponding copies in main memory. As a result, data stored in a cache may differ from the corresponding copy in main memory. A cache line containing differing data may be termed “modified” or “dirty,” depending on, among other considerations, which cache coherency protocol may be implemented by the computer system comprising the compute unit, whereas a cache line containing data identical to the corresponding copy in main memory may be termed “unmodified” or “clean.”
When a compute unit is directed to power down, one issue to be addressed is the status of the cache. Commonly, when a compute unit is directed to power down, the compute unit will save off its architectural state to some memory retention area, flush its caches of all differing data (i.e., complete any writing of differing data from cache locations to main memory and evict the differing data from the cache), and then signal its low power readiness to the PMU. However, depending on the amount of differing data in the cache, the process of flushing the cache may be the most time-consuming part of the power-down process. For example, saving off the compute unit architectural state to memory (such as, but not necessarily, main memory) may take 3000-5000 clock cycles, whereas flushing a cache may take about two to five clock cycles per differing cache line. It is not uncommon for a cache to have tens of thousands of differing cache lines and require ˜50,000 clock cycles (˜50 μsec in contemporary desktop computer processors) to be completely flushed. The time spent on cache flushing is time that the compute unit is powered up but relatively inactive and thus wasting power and generating unnecessary heat.