Computer processors perform cache prefetching to boost execution performance by fetching instructions or data from their original storage in slower memory (i.e., having slower read/write times) to a faster local memory (i.e., having faster read/write times and often located nearer to the instruction/data pipelines) before it is actually needed. Most modern computer processors have one or more fast and local cache memories in which prefetched data and/or instructions are held until required.
However, prefetching works by guessing. To put it more technically, prefetching uses the current series of memory demands for data/instructions by the processing engine to predict, based on, e.g., past performance, probability models, algorithms, and/or what data/instructions the processing engine will demand next. Accordingly, inaccurate prefetches are problematic, as the wrong data has to be removed from the local faster memory, and the correct data must be accessed and moved into the local faster memory. Inaccurate prefetches unnecessarily increase power consumption, produce system congestion (caused at least by the added movement/exchange of the wrong data with the correct data), and obviously pollute and destabilize the normal functioning of the caches.
There are different methods of prefetching, often distinguished by their patterns for prefetching data/instructions, such as sequential prefetching and stride prefetching. Although somewhat oversimplified, sequential prefetching can be thought of as prefetching successive contiguous memory blocks, while stride prefetching can be thought of as jumping ahead (or “striding” an s number of blocks) in order to prefetch the memory blocks.
There is also a more specialized scheme related to striding known as spatial memory streaming. See, e.g., Somogyi et al., Spatial Memory Streaming, 33rd Int'l Symposium on Computer Architecture (ISCA 2006), pp. 252-263 (hereinafter, “Somogyi 2006”); and Somogyi et al., Spatial Memory Streaming, Journal of Instruction-Level Parallelism 13 (2011), pp. 1-26 (hereinafter, “Somogyi 2011”), both of which are incorporated herein by reference in their entireties. In spatial memory streaming (SMS), strong correlations between code and access patterns are detected and exploited to predict memory access patterns in groups with similar relative spacing (“spatial correlation”). In Somogyi's specific design, the SMS is implemented entirely in hardware separate from the processor, although an SMS may be implemented in other ways, as would be understood by those of skill in the art.
However, SMS schemes suffer a variety of weaknesses. SMS cannot handle the shifting alignment of patterns with respect to the line boundaries between caches. Furthermore, the spatial bit vectors typically used for spatial patterns force larger granularity per access, and cannot track temporal order. SMS also lacks robust confidence mechanisms, and is not dynamically adaptive, i.e., SMS is unable to adapt to program phase changes, such as when dynamic branch behavior changes offset patterns. These weaknesses result in reduced coverage and accuracy, and loss of timeliness, thus reducing performance and increasing power consumption.