Computer systems typically include a processing unit and one or more cache memories. A cache memory is a high-speed memory that acts as a buffer between the processor and main memory. Although smaller than the main memory, the cache memory typically has appreciably faster access time than the main memory. Memory subsystem performance can be increased by storing the most commonly used information in smaller but faster cache memories.
When the processor accesses a memory address, the cache memory determines if the data associated with the memory address is stored in the cache memory. If the data is stored in the cache memory, a cache hit results and the data is provided to the processor from the cache memory. If the data is not in the cache memory, a cache miss results and a lower level in the memory hierarchy must be accessed. Due to the additional access time for lower level memory, data cache misses can account for a significant portion of an application program's execution time.
In order to reduce cache miss rates, various hardware prefetching techniques have been developed. Prefetching involves fetching data or instructions from lower levels in the memory hierarchy and into the cache memory before the processor would ordinarily request the data be fetched. By anticipating processor access patterns, prefetching helps reduce average memory service time. The effectiveness of prefetching is limited by the ability of a particular prefetching method to predict addresses from which the processor will need to access data. Hardware prefetching methods typically attempt to take advantage of patterns in memory accesses by observing all, or a particular subset of, memory transactions and prefetching as yet unaccessed data for anticipated memory accesses. Memory transactions observed can include read and/or write accesses or cache miss transactions.
Various methods of hardware prefetching are typically beneficial for some applications or workloads and may be detrimental for other applications. One type of prefetching is known as next line data prefetching. Client applications and the SpecCpu benchmark applications may benefit from next line prefetching due to typically sequential memory access patterns, but database and server applications often do not benefit from next line data prefetching due to typically non-sequential memory access patterns. Another type of prefetching involves training a prefetch table (PT) based on the L1 cache miss data. However, training a data prefetcher with the previous cache miss address stream does not always generate accurate prefetch requests for the next level cache. For example, when the L1 cache miss data overflows the PT table it may be difficult to detect patterns and accurately predict what data to bring into the cache.
The various hardware prefetchers are typically statically configured in an operating system when the system boots. Therefore, the hardware prefetcher is unresponsive to changing operating conditions and may suffer decreased performance at various times between system boots under some applications or workloads.