1. Field of the Invention
The present invention relates, in general, to cache memory, and, more particularly, to a cache memory design using speculative or preemptive write backs of dirty cache lines to main memory to regulate memory bus traffic volume.
2. Relevant Background
The ability of processors to execute instructions has typically outpaced the ability of memory subsystems to supply instructions and data to the processors. As used herein the terms "microprocessor" and "processor" include complete instruction set computers (CISC), reduced instruction set computers (RISC) and hybrids. Most processors use a cache memory system to speed memory access. Cache memory comprises one or more levels of dedicated high-speed memory holding recently accessed data, designed to speed up subsequent access to the same data. Cache sizes of high-performance processors are continuously growing.
Cache technology is based on a premise that programs frequently reuse the same instructions and data. When data is read from main system memory, a copy is also saved in the cache memory, along with its tag. The cache then monitors subsequent requests for data to see if the information needed has already been stored in the cache. If the data had indeed been stored in the cache, the data is delivered with low latency to the processor while the attempt to fetch the information from main memory is aborted (or not started). If, on the other hand, the data had not been previously stored in cache then it is fetched from main memory and also saved in cache for future access.
In superscalar processors multiple instructions are executed each clock cycle possibly leading to multiple requests for data stored in main memory each clock cycle. This is particularly true during events such as context switching where a new process or thread is started and a previously executing process is stalled, slowed, or aborted. In these cases, the cache(s) will be filled with data and instructions associated with the waning process that need to be replaced by data and instructions associated with the newly started process. Each time a cache line is replaced (i.e., overwritten or evicted), however, if it is dirty (i.e., differs from the corresponding data line in main memory) it must be written back to main memory before it is replaced. In these cases processor performance is very dependent on the speed with which the instructions and data from the waning process can be evicted from cache.
Often times, as in the case of a context switch described above, peak write back traffic occurs simultaneously with peak read traffic. In these cases, the memory bandwidth is preferably allocated preferentially to read traffic to ensure that instruction execution does not stall. To expedite eviction of cache lines, write buffers are used to temporarily hold the evicted data until it can be written back. Unfortunately, the write buffers are either sized to handle peak loads, in which case they are space inefficient, or they are smaller than peak load capacity, in which case processor performance is compromised. A write buffer can cause a processor stall when it is full, when it contends with a cache miss for access to the next level of cache or memory hierarchy, and when it contains the freshest copy of data needed by a load operation. Hence, it is desirable to regulate memory bandwidth required by write operations that limits the use of write buffers.
Write policy refers to whether dirty cache lines are written back to main memory as soon as they are changed (i.e., write through) or at a time determined by the replacement algorithm (i.e., write back or lazy write back). A write through policy ensures that data altered by the executing program is copied to main memory and non-volatile storage as soon as possible to maximize data integrity. A pure write through policy, however, commits significant memory bandwidth to writing data out of cache and negatively impacts system performance. A lazy write back policy waits to initiate a write back until system resources require a write back (e.g. upon filling all possible cache lines in a cache level) and so increases system performance at the expense of increasing burstiness of the write back traffic. Write back policy is established to meet the needs of a particular application.
Memory management techniques typically use a precise or pseudo least recently used (LRU) algorithm to select cache lines for replacement. The LRU technique monitors addresses of cache lines that are accessed and selects lines for replacement based upon frequency of accesses or how recently a cache line was accessed. LRU techniques typically initiate replacement and the associated write backs on demand when all possible cache lines of one cache level are completely filled or filled (i.e., all n ways of an n-way set associative cache, for example). Once all possible cache lines are full replacement of a cache line is necessary before new data can be transferred into the cache. Waiting for a cache to fill before initiating a write back exasperates the peak load problem by waiting until it is critical to write back information. In the case of a context switch, for example, much of the cache data may be dirty and require write back before new data can be loaded.
There is a trend to use larger cache lines (i.e., cache lines that each hold more data) to take advantage of spatial locality in data storage. Larger cache lines require a smaller tag storage area because each address tag has less information or increased granularity. However, larger cache lines may result in loading data into cache that is not used as an entire cache line is filled even for a small memory request. Likewise, an entire cache line can be replaced (and require write back) to load in only a small amount of new data. Hence, increasing granularity results in frequent high volume write backs.
Using a technique called sub-blocking, larger cache levels have a higher granularity than smaller cache levels. The higher sub-blocked cache levels have fewer tag entries as each entry represents a larger number of data bytes in the sub-blocked cache as compared to the lower level cache(s). For example, each line in the sub-blocked cache may hold two, four, or more lines of data from the lower cache level(s). Each lower-level cache line is referred to as a block within the higher level cache line. The tag information in the sub-blocked cache is augmented with more valid bits where each valid bit indicates whether a specific block is valid. Hence, sub-blocking is a compromise that improves cache efficiency of the lower cache levels while reducing the tag size and data transfer requirements of higher cache levels. Sub-blocking increases complexity of cache management, however, and in particular makes replacement more difficult. For example, evicting a single cache line in a low level cache might result in evicting (and writing back) multiple lines in the higher level cache. In a typical example, replacement of a 64 byte level one cache line may result in write back of 512 bytes of data (i.e., the equivalent of eight cache lines) to main memory from a higher level cache. For collision conflicts, sub-blocking increases the frequency of write backs and greatly taxes memory bandwidth. Both increasing cache line size and sub-blocking take advantage of spatial locality to reduce the size of the tag storage area, but a need exists for a cache system and method for operating a cache that takes advantage of spatial locality while regulating peak memory traffic during write backs.