A computer processor typically uses cache prefetching to boost execution performance by fetching an instruction or data from their original storage in slower memory to a faster local memory before the instruction or data is actually needed. For example, a processor requests an instruction or data block from main memory before the instruction or data block is actually needed and places the corresponding instruction or data block in a cache. When the instruction or data block is actually needed, it can be accessed much more quickly from the cache than if it had to be requested from the main memory. Prefetching hides memory access latency. As data access patterns show less regularity than instruction patterns, accurate data prefetching is generally more challenging than instruction prefetching.
In the context of prefetching, the degree is the number of cache lines prefetched or predicted ahead of time in a prefetching operation. Prefetch distance shows how far ahead of the demand access stream, the data blocks are prefetched. A prefetch operation is useful or useless depending on whether the item brought by it prevents or does not prevent a future cache miss. A prefetch operation is harmful if the item brought by it replaces a useful block and thus, possibly increases the cache misses. Harmful prefetches lead to cache pollution. A prefetch operation is redundant if the data-block brought by it is already present in cache.
In order for a data prefetcher to cover dynamic random access memory (DRAM) access latency, the data prefetcher often gets many accesses ahead of the demand stream. However, issuing many incorrect prefetches may overload the memory system and reduce throughput. Furthermore, a confirmation queue for a computer processor may require a large content addressable memory (CAM) structure with high power consumption, uses large area, and timing critical. Generally, a confirmation queue is a CAM that checks incoming virtual or memory addresses against all entries. Such a large CAM has slower speed. A larger CAM structure requires higher power, and potentially causes speed path issues since multi-cycle adds complexity.