The disclosed subject matter relates generally to cache memory systems and, more particularly, to a multi-level cache memory system that reduces a performance hit associated with wasted cache prefetching.
Modern microprocessors are much faster than the memory in which the program is stored. That is, the microprocessor can execute instructions at a rate that is faster than the rate at which the instructions can be retrieved from memory, and thus, the program's instructions cannot be read fast enough to keep the microprocessor busy.
Prefetching the instructions before they are actually needed by the microprocessor is a useful mechanism to overcome the relatively slow response of memory and allow the processor to operate at its substantially higher speed. When the instruction is prefetched from memory, it is placed in a cache where it may be accessed very quickly when the processor is ready to execute that particular instruction.
One problem with prefetching is that software programs are not always executed in the order that they are stored. In fact, there are many instructions that cause the software program to branch or jump to another location in the program. Thus, accurately predicting when the program will branch or jump to another location can dramatically affect the quality of the prefetching, and consequently, the speed at which the processor can execute the software program. Many mechanisms have been proposed that supposedly enhance the predictions so as to allow more continuous, speedy operation of the processor. However, these predictors have at least one thing in common—they are at least occasionally wrong and instructions are prefetched that are not used by the processor. That is, prefetch algorithms may prove beneficial for some applications but ineffective on others. When prefetched code or data is not consistently and accurately accessed during the execution of the program, a prefetcher can actually hurt the performance of the processor.
A ‘wasted’ prefetch is a memory access that causes a cache to be filled with an anticipated instruction(s) but ages out of the cache before it is accessed. Wasted prefetches consume system and memory bandwidth and pollute the processor core's private caches and also shared Chip-Multi-Processor (CMP) caches.
Modern prefetchers can be very aggressive in that they prefetch code and data at high rates with a high proportion of wasted prefetches.