1. Field of the Invention
This disclosure generally relates to techniques for reducing pre-fetching overhead for processors in computer systems. More specifically, this disclosure relates to techniques for filtering pre-fetch requests to reduce cache and memory pre-fetching overhead.
2. Related Art
To achieve high instruction throughput rates, the memory subsystem of a processor typically includes multiple levels of cache memories. Accesses to such cache memories generally operate as follows. During execution, a processor may execute an instruction that references a memory location. If the referenced memory location is not available in a level one (L1) cache, a cache miss causes the L1 cache to send a corresponding request to a level two (L2) cache. Next, if the referenced memory location is also not available in the L2 cache, additional requests may need to be sent to lower levels of the processor's memory hierarchy.
In a typical high-performance processor, off-chip memory latency (e.g., to a DRAM memory) is often an order of magnitude or more larger than on-chip memory latency. Pre-fetching techniques try to hide this latency by predicting which cache lines might be needed in the future and preemptively pre-fetching those cache lines. For instance, pre-fetching operations may be initiated on a cache miss. For example, when a load instruction misses in the cache, the pre-fetch unit can predict the next few lines that might be needed, and can issue pre-fetches for those lines.
Unfortunately, while pre-fetching techniques generally reduce cache miss delays, they also involve additional overhead. Not all cache lines that are pre-fetched will be used, and such superfluous cache line reads consume memory bandwidth and can cause unnecessary energy consumption in the off-chip memory, the on-chip caches, and the memory network. Hence, what is needed are techniques for pre-fetching cache lines without the above-described problems.