In current systems, in the event of a cache miss, the time required for a microprocessor to access system memory can be one or two orders of magnitude more than the time required to access the cache memory. For this reason, to improve their cache hit rate, microprocessors incorporate prefetching techniques that examine recent data access patterns and attempt to predict which data the program will access next. The benefits of prefetching are well known.
However, the present inventors have observed that access patterns of some programs are not detected by conventional microprocessor prefetch units. For example, the graph shown in FIG. 1 illustrates the pattern of accesses presented to a level-2 (L2) cache memory while executing a program that includes a sequence of store operations through memory. The graph plots the memory address as a function of time. As may be observed from the graph, although there is a general trend over time of increasing memory addresses, i.e., in the upward direction, in many cases the memory address of a given access may be downward relative to its temporal predecessor rather than upward according to the general trend. This makes it highly unlikely that conventional prefetchers will prefetch effectively.
There are at least two reasons that the memory accesses presented to a cache memory of a microprocessor may, although exhibiting a general trend in one direction when viewed as a relatively large sample, appear chaotic when viewed in a small sample by a conventional prefetcher. The first reason is that the program accesses memory in this manner by its construction, whether by the nature of its algorithms or by poor programming. The second reason is that out-of-order execution microprocessor cores, by the normal function of their pipelines and queues when operating at capacity, often re-order the memory accesses differently than they were generated by the program.
Therefore, what is needed is a prefetcher that is capable of effectively prefetching data for programs that exhibit no clear trend when considering their memory accesses within relatively small time windows, but present a clear trend when examined in relatively large samples.