1. Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to an apparatus and method for improving data prefetching efficiency with history based prefetching.
2. Description of the Related Art
Many modern microprocessors have large instruction pipelines that facilitate high speed operation. “Fetched” program instructions enter the pipeline, undergo operations such as decoding and executing in intermediate stages of the pipeline, and are “retired” at the end of the pipeline. When the pipeline receives a valid instruction and the data needed to process the instruction each clock cycle, the pipeline remains full and performance is good. When valid instructions are not received each cycle and/or when the necessary data is not available the pipeline may stall and performance can suffer. For example, performance problems can result from branch instructions in program code. If a branch instruction is encountered in the program and the processing branches to the target address, a portion of the instruction pipeline may have to be flushed, resulting in a performance penalty. Moreover, even with sequentially executed (i.e., non-branch) instructions, modern microprocessors are much faster than the memory where the program is kept, meaning that the program's instructions and data cannot be read fast enough to keep the microprocessor busy.
System performance may be enhanced and effective memory access latency may be reduced by anticipating the needs of a processor. If the data and instructions needed by a processor in the near future are predicted, then the data and instructions can be fetched in advance or “prefetched”, such that the data/instructions are buffered/cached and available to the processor with low latency. A prefetcher that accurately predicts a READ request (such as, for example, for a branch instruction) and issues it in advance of an actual READ can thus, significantly improve system performance. Prefetchers can be implemented in a CPU or in a chipset, and prefetching schemes have been routinely used for both.
Prefetching may be performed at various levels of a CPU's cache hierarchy. For example, some current x86-based processors include a Level 2 (L2) cache stream prefetcher to reduce the number of L2 and lower level (e.g., L3) cache misses. The stream prefetcher predicts future accesses within a memory page based on the order of accesses within that page and the distance between subsequent accesses. However, current prefetching techniques do not retain a history of past accesses within a memory page and use this information to predict and prefetch data and/or instructions.
Thus, what is needed is an improved prefetching technique which relies (at least in part) on the history of past accesses within a memory page.