To facilitate access to memory data, processors often include one or more small, fast memory caches to cache memory data that is likely to be needed again soon. When the processor needs to access memory, the processor first checks the data cache for the data and accesses main memory only if the required data is not in the cache. Thus, the processor may often avoid the performance penalty of accessing main memory.
Typically, caches are configured to store blocks of memory data that were accessed recently. If a processor accesses data stored at a given memory address, the cache may read a block of memory within which the address falls. The block may comprise a contiguous set of memory addresses, including the accessed address. Thus, the cache may leverage temporal and spatial locality properties of a memory access stream.
Some caches employ prefetching optimizations. A prefetching optimization uses a hardware and/or software prefetcher to cache blocks of memory data that have not yet been accessed and/or in response to the processor accessing memory data outside of those blocks. For example, in response to detecting that the processor is accessing the data in a given memory block sequentially, a prefetcher may predict that the sequential access pattern will continue onto the next memory block. In anticipation that the next memory block will be accessed, the prefetcher prefetches the next memory block into cache, even before that block is accessed. If the prefetching prediction is correct and the processor does subsequently access the next memory block, the prefetching will have hidden some or all of the latency associated with fetching the next memory block from main memory.
Traditional prefetch architectures sometimes include multiple requestors and an arbiter. Each prefetch requestor may employ a respective algorithm for generating prefetch requests in response to various events. The requests are queued by an arbiter, which then issues each prefetch to memory in the order received if memory resources are available and the request is still relevant.
Traditional prefetchers (requestors) attempt to increase performance by maximizing hit rates in the cache. For example, next-line prefetchers attempt to detect sequential access patterns and prefetch the next cache line. Stride-pattern prefetchers may detect more sophisticated access patters, which may span multiple memory blocks. For example, if the processor accesses every other memory block, a stride-pattern prefetcher may detect the pattern and prefetch accordingly. Other prefetchers may detect that a group of memory blocks are typically accessed together in close temporal proximity, and in response to detecting an access to one of the memory blocks, prefetch any number of the other memory blocks in the group.