The growing disparity between access time to cache memories within a processor versus access time to system memory by the processor highlights the need for good prefetching approaches by the processor. Mowry, for example, described a modification to a compiler to exploit exclusive-mode prefetching. The compiler performs locality analysis to partition memory references into “equivalence classes, which are sets of references that can be treated as a single reference” and inserts “an exclusive-mode prefetch rather than a shared-mode prefetch for a given equivalence class if at least one member of the equivalence class is a write.” See Tolerating latency through software-controlled data prefetching, Mowry, Todd Carl, Ph. D, Stanford University, dissertation 1994, page 89.
A disadvantage of a software-based prefetch approach such as Mowry describes is that it increases code size because prefetch instructions are added to the program. The increased code size may require more storage space on the main storage (e.g., disk drive) of the system to hold the larger program as well as in the system memory to hold the larger program as it runs. The additional instructions also consume resources in the processor, such as instruction dispatch slots, reservation station slots, reorder buffer slots, and execution unit slots, all of which may negatively impact the performance of the processor, in particular by reducing the effective lookahead within the instruction window, which is crucial to exploiting instruction level parallelism. Another disadvantage is that it does not provide the benefit for all programs that are run on the processor, but only those programs that have been profiled and compiled using the optimized compiler.