1. Field of the Invention
The present invention generally relates to a method for exchanging cached data within multiprocessor systems and, more particularly, to the interaction between cache memory components in the memory hierarchy in multiprocessor systems such as Chip Multiprocessor (CMP) systems and Symmetric Multiprocessor (SMP) systems. The invention comprises a prediction mechanism for determining whether a volatile or non-volatile data copy should be provided when a cache needs to supply data to another cache.
2. Background Description
Memory access latency has been a serious performance bottleneck in modern computer systems. As processor speeds continue at a much higher rate than memory speeds, memory access latency may soon approach a thousand processor cycles.
Caching is a common technique to reduce effective memory access latency. A processor can access a cache faster than the main memory because, compared with the main memory, a cache generally is closer to the accessing processor, usually has a smaller size, and typically uses faster device technology. Traditionally, the main memory is implemented using dynamic random access memory (DRAM), and a cache is implemented using static random access memory (SRAM). In recent years, embedded DRAM (eDRAM) has seen its use in cache implementations (e.g. off-chip L3 caches in the IBM Power4 multiprocessor system).
Conceptually, a cache can reduce memory access latency by taking advantage of temporal and spatial locality in programs. To exploit spatial locality, a cache is typically organized in multi-byte cache lines. To exploit temporal locality, a cache usually employs an appropriate replacement algorithm such as the least-recently-used (LRU) policy or pseudo-LRU replacement policy to keep recently used data in the cache.
A modern computer system typically uses a memory hierarchy that comprises the main memory and multiple levels of caches. For a processor, an L1 (level 1) cache is at the lowest level of the memory hierarchy and is closest to the processor. An L1 cache is almost always on the same chip with the CPU (central processing unit) so that it can be accessed by the CPU with very short access latency. Sometimes an L1 cache is partitioned into an instruction cache and a data cache.