1. Technical Field
The present invention relates to data processing and, in particular, to cache prefetching in data processing systems. Still more particularly, the present invention provides a method, apparatus, and program to efficiently calculate cache prefetching patterns for loops.
2. Description of Related Art
Many current software runtime environments use cache prefetching. Prefetching works as follows: upon detecting a sequential memory access pattern in an executing program the software environment starts to prefetch cache lines from main memory to L1/L2 caches. The purpose is to make the data available to the executing program in the low latency cache when the data is actually accessed later, thereby reducing the average memory access time.
An example of a software runtime environment is a Java™ virtual machine (JVM). There are loops in the Java™ programming language, for example, that iterate over a data structure in such a way that access to storage follows a predictable pattern. If a JVM had knowledge of such a pattern, it could insert cache prefetch instructions into the code stream or determine unroll factors that would speed up execution of the loop.
Thus, while interpreting or just-in-time (JIT) compiling bytecode, the JVM may look for access patterns that are regular. The JVM may then leave a record of such patterns, such as expected stride, for exploitation by a JIT compiler and may possibly insert prefetch instructions or determine loop unrolling factors.
Stride is a distance between successive memory accesses. Current methods for determining stride using well-known static compiler techniques to evaluate variables are used to index a loop. To augment this analysis, the JVM compiler may profile a given routine and collect data. This requires either building a bytecode interpreter before generating binary code, generating bytecode to perform the profiling, or generating a binary version with profiling hooks and then later recompiling the routine utilizing the profiling information. All of these approaches pay a high processing overhead. Additionally, if data structures other than the profile information are being manipulated, such as B-trees for example, then analysis becomes even more difficult.