One constraint on the increased performance of computer systems is often the time it takes to access data from memory. This is because processors such as central processing units (CPUs) and graphics processors typically run at much higher speeds than the main memory, which often means there are delays caused by the memory. It typically takes many clock cycles to retrieve data from the main memory, which tends to decrease the overall system performance.
Most computer systems use one or more caches to store some data closer to the processor, so that data can be accessed faster than the main memory. Computer systems may include a hierarchy of cache memories with the smallest and the fastest located closest to the CPU, and with each higher-level cache containing all of the data in the level below. In the present disclosure, lower level caches will refer to those caches closer to the processor. Retrieving data from a cache, instead of from the main memory, reduces the number of clock cycles a processor must wait for data and tends to increase the throughput of the computer system. However, given the relatively small size of the cache, or caches, compared to the main memory, it is not always possible to retrieve data from the cache and avoid reading it from the main memory. In multiple level cache systems the data might only be available in a second or third level cache, in which case data retrieval is typically still faster than from the main memory, although not as fast as from the lower level cache.
An inherent problem in computer systems using multiple levels of cache is ensuring that any modifications or “writes” to the lower level cache are consistently updated in the higher level caches, and to the main memory, so that if the data is needed later only the latest version of the data is retrieved. Two general approaches to the problem of cache writing are often used: write-through and write-back. Write-through involves writing the changed data to main memory, and to all levels of the cache hierarchy. Write-back is a technique by which any modifications or “writes” are tracked only to the lower level cache and not written to main memory, or higher levels of cache, until the cache entry is replaced. The write-back technique offers some economies by reducing write data traffic to main memory, such as when a particular cache entry is changed many times, at the cost of incurring the additional overhead needed to track changes in the cache data.
There are situations in which all, or a substantial portion of, the cache entries need to be replaced, which may be referred to as flushing the cache. Such flushing requires writing any changed cache entries back and to main memory. In a system with multiple levels of write-back cache hierarchy, flushing of the caches start at the lowest level of cache, followed by the next higher level of cache until all the caches in the hierarchy have been “flushed”. All modified data will be written back to higher level of caches and main memory followed by invalidation of the same line in the cache. Many existing cache flushing techniques require two or three cache request micro-operations per line of cache, regardless of whether or not there have been any modifications to the data in a particular line of cache. It would be advantageous to have a method or apparatus to efficiently flush the cache, such as not executing micro-operations for cache data that has not been modified and does not need to be written main memory. For example, systems using fewer than three cache request micro-operations per line being flushed would be desirable. As the size and speed of caches increase, which is a trend that seems likely to continue, the advantage of efficiently flushing becomes even more pronounced.
As will be discussed more fully below, there are many ways of configuring a given size cache through the degree of associativity. It would be advantageous to exploit the associativity of a cache to increase the efficiency of cache flushing. Even better would be to allow the designers and system architects to select among a range of possible cache flushing efficiencies for a given degree of associativity in a cache, effectively allowing for explicit tradeoffs between the speed of cache flushing and the hardware dedicated to achieving that performance.