The present disclosure is generally directed to instruction prefetching and, more specifically, to techniques for dynamic sequential instruction prefetching in a data processing system.
In general, a processor is much faster than main memory that stores programs and, as such, main memory may not be able source program instructions fast enough to keep the processor busy. Incorporating a cache memory (cache) within a data processing system has been used to provide faster processor access to program instructions. As is known, a cache is physically located closer to a processor than main memory and is usually faster than main memory. In computer architecture, instruction prefetching is also used by processors to speed-up program execution by reducing processor wait states. Instruction prefetching occurs when a processor requests that an instruction from lower level memory (e.g., main memory) be loaded into cache before the instruction is actually needed. With instruction prefetching, an instruction can be accessed more quickly from cache than if a processor had to request the instruction from main memory when actually needed, thus preventing a processor stall while awaiting receipt of the instruction from main memory.
Sequential prefetching refers to a cache requesting a number of sequential cache lines from lower level memory when one or more instructions at a particular location are anticipated to be executed. For example, a sequential prefetcher may statically prefetch two additional cache lines when a given cache line is prefetched. As one example, if a cache line at address ‘N’ is prefetched, cache lines at addresses ‘N+1’ and ‘N+2’ would also be prefetched by a sequential prefetcher that statically prefetches two additional cache lines. Unfortunately, sequentially prefetching additional cache lines statically may result in cache pollution due to the additional cache lines not being utilized prior to ejection from the cache. Sequentially prefetching too many instruction cache lines may also reduce processor performance by causing thrashing in an instruction cache. Moreover, sequentially prefetching too few instruction cache lines may also reduce processor performance due to latency in executing instructions.