Multiprocessor systems integrate increasing numbers of processor cores onto a single integrated circuit chip. While the increasing number of processor cores may allow more processing capabilities for the integrated circuit chip, available bandwidth for off-chip resources (i.e., off-chip bandwidth) such as memory may not scale as quickly as the number of cores. Off-chip bandwidth may often be limited by the number of pins available for interfacing between the integrated circuit chip and its socket or printed circuit board. The limitation to the available off-chip bandwidth can manifest as latency between processor cores requesting and receiving access to external memory.
Some processors rely on prefetching to mitigate the latency of accesses to external memory. In prefetching, blocks of data are loaded from the external memory into a cache before the data is expected to be requested by the processor in the near future. While prefetching can reduce execution time of programs on the average, some prefetches are wasted because the prefetched blocks are not used before they are evicted from cache. With multiple cores sharing off-chip bandwidth, wasted prefetches may represent non-optimal use of limited off-chip bandwidth resources.