1. Technical Field
The present invention relates to data processing. systems and, in particular, to hardware cache prefetch in single processor and multiple processor data processing systems. Still more particularly, the present invention provides runtime selective control of hardware prefetch in a data processing system.
2. Description of Related Art
Many current processor architectures implement hardware prefetch. The prefetch works as follows: upon detecting sequential memory access pattern in an executing program the computer hardware starts to prefetch cache lines from main memory to L1/L2 caches. The purpose is to make the data available to the executing program in the low latency cache when the data is actually accessed later, thereby reducing the average memory access time.
Hardware prefetch unfortunately does not always help processor performance. Prefetch may be falsely triggered, for example, by a short stretch of fixed stride access patterns, or any other hardware prefetch triggering scheme. As a result the data prefetched are mostly not used by the program. In this case the performance can actually be worse due to cache pollution, because prefetched data may displace useful data in the cache and, thus, increase the cache miss ratio. Also, the large number of falsely triggered prefetches may consume a significant amount of memory bandwidth, thereby increasing the queuing delay of every memory access, resulting in higher a average memory access time.
Note that this problem has serious negative implications in a multi-user multi-processor (MP) environment. Falsely triggered prefetch by one application may flood the memory system, which is shared by all applications running at the same time. These applications may suffer a tremendous negative performance impact from longer memory access time, even though they themselves may not engage in any prefetching activity.
The problem becomes more complicated with the advent of logical partition (LPAR) and shared processor logical partition (SPLPAR), where multiple different and unrelated business customers may share an MP system. In this case one rogue application that generates a high volume of falsely triggered prefetching requests in one partition will likely affect all of the applications running in the other partitions that belong to the other business customers.
There is strong evidence that prefetch can significantly degrade the performance of some real applications. In some tests, measurements from hardware performance counters have shown average memory access times of a few thousand cycles, instead of the normal less than one hundred cycles. This is a fairly good indication that prefetching has overwhelmed the memory system.
The main cause for the dilemma in hardware prefetching is that the setting of a prefetch policy is for the whole system for the entire time the system is operating. Prefetch is turned on or turned off at system boot time. Once prefetch is turned on, hardware prefetch is active in all processors with all applications, opening the possibility that one application can significantly degrade the performance of all other applications, including itself, as described above.
Turning off prefetch for the whole system may not be a good option because there are significantly many applications, especially scientific applications, that may benefit enormously from hardware prefetching. This is the main reason that many computer manufacturers currently ship systems with prefetch turned-on by default.