1. Technical Field
The invention relates generally to computer systems, and more particularly relates to cache memory systems. In even greater particularity, the invention relates to cache flush mechanisms.
In an exemplary embodiment, the invention is used in connection with the internal L1 (level 1) cache on an x86 processor.
2. Related Art
Processors (such as microprocessors) commonly include an internal L1 (level one) cache. The L1 cache is typically operated in either of two modes: write-through or write-back (copy-back).
In write-through mode, each write to a cache line also results in an external bus cycle to write the corresponding data through to system DRAM--as a result, the cache and system DRAM always have the same data (are always coherent). In write-back mode, to reduce external bus traffic, writes to the cache are not automatically written-back to system DRAM, but rather, external write-back cycles are run to update system DRAM only if a cache line containing "dirty" data is replaced, invalidated, or exported (without invalidation) in response to a cache inquiry--in particular, a cache coherency protocol including cache inquiry cycles is required to ensure memory coherency during DMA (direct memory access) operations in which an external device (such as a disk drive) may directly access system DRAM (including locations that are also in the L1 cache).
In addition, under certain conditions, the entire L1 cache is invalidated or exported. If the cache is operated in write-back mode, then cache invalidation is implemented as a "flush" (export-then-invalidate)--each line of the cache is scanned for dirty data, and any dirty data is written-back prior to invalidating that cache line.
Without limiting the scope of the invention, this background information is provided in the context of a specific problem to which the invention has application: reducing the time required to export or flush the entire internal L1 cache of a processor. More generally, the problem is to reduce the time to export or flush any cache, internal or external, operating in write-back mode.
A common goal of processor design is to increase cache size. As caches become larger, the time to flush/export the entire cache increases. Typically, merely scanning the cache and checking dirty bits to identify cache lines (or data) that must be exported requires one clock cycle per line (the number of additional clocks required to complete the flush depends on the number of dirty lines and whether only the dirty data in a cache line or the entire cache line is exported).
Thus, for an 8K cache organized into 4 sets of 128 lines per set, over 500 clocks will be required to complete an export flush, while for a 16K cache organized into 4 sets of 256 lines per set for a total of 1024 cache lines, this flush/export penalty jumps to over a thousand clocks.
In the typical case, most of the data in a cache will be coherent with system DRAM, so that only a subset of the cache actually needs to be exported. However, for current cache designs, each flush/export operation still requires a full scan of the cache.