Field
The disclosed embodiments relate to instruction-prefetching mechanisms in computer processors. More specifically, the disclosed embodiments relate to the design of a correlation-based instruction prefetcher, which uses an outer-level cache (e.g., an L2 cache) to store the correlation keys (instruction miss addresses).
Related Art
For processors running commercial applications, instruction cache misses can significantly degrade system performance. This is because the large instruction working sets of such applications can rapidly overwhelm the processor's first level instruction cache, thereby causing numerous cache misses which can stall the processor's fetch unit. Even processors that use techniques such as chip multi-threading (CMT) to deal with the performance problems caused by cache misses are susceptible to this problem. This is because the large number of threads that share the same instruction cache in such systems cause even higher instruction cache miss rates.
To reduce instruction cache miss rates, processors commonly perform sequential prefetching to prefetch cache lines that sequentially follow a current cache line. Although sequential instruction prefetching is simple to implement and effectively removes instruction cache misses arising from sequential instruction fetches, sequential prefetching is not effective at removing instruction cache misses that arise from large discontinuities in instruction fetch addresses. For example, such large discontinuities are frequently caused by taken branches, jumps, function calls or function returns.
Hence, what is needed is a method and an apparatus for reducing the number of instruction cache misses which are caused by large discontinuities in instruction fetch addresses.