1. Field
Embodiments of the present invention relate generally to the field of compiler. More particularly, embodiments of the present invention relate to compiler-implemented prefetching.
2. Description of the Related Art
A compiler is software that translates a computer program written in a high-level language, such as C++ or FORTRAN, into machine language. As shown in FIG. 1, the compiler 100 takes code as input and generates a machine executable binary file. FIG. 2 shown some of the modules that can be implemented by compiler 100.
Compiler 100 can include a front-end module 11. The front-end module 11 can perform the various front-end processing of the programming language—e.g. like a C/C++ Front-End or a Fortran Front-Ent—such as The translating the high-level code written by a programmer into an IR used inside the compiler 100 and other such processing. The IR is compiler-specific and they are well-known in the art. Some examples are the RTL representation and tree SSA representations used by GCC[brm1], the Stanford University Intermediate Format (SUIF) representation[brm2], the Pegasus intermediate representation[brm3], the WHIRL intermediate representation of the MIPSpro Compiler from Silicon Graphics Incorporated.
Compiler 100 can also include a prefetch optimization module 12. One optimization performed by some compilers is prefetch optimization. Prefetch optimization masks memory access latency by issuing a memory request before the requested value is used. While the value is retrieved from memory—which can take up to 300 or more cycles—the processor can execute other instructions, effectively hiding the memory access latency. Data prefetching, or simply “prefetching,” is well-known in the art.
The compiler 100 can perform other optimizations represented by block 13 in FIG. 2. These include inter-procedural optimizations, redundancy elimination, dead code elimination, constant propagation, copy propagation, loop transformations, and other such optimizations. The compiler then generates the binary executable from the optimized IR using the binary generation module 14.
A prior art method of prefetching is now described with reference to FIG. 3. In block 302 the parser performing the prefetch optimization arrives at a loop. A loop is generally defined as a sequence of instructions that repeats either a specified number of times or until a particular condition is met. Some example loop instructions in C++ are “while” and “for” loops.
In block 304, a decision is made as to whether the identified loop is appropriate for prefetching. A loop may be too short, not include a load to prefetch, or have some other property that eliminates it from prefetch optimization. If the loop is not appropriate for prefetching, then the process terminates and the parser continues searching for the next loop. If, however, the loop is a good candidate for prefetching, then, in block 306, the prefetch distance to be used for the prefetch is calculated.
The prefetch distance can be generally defined as the number of iterations of the loop ahead that a prefetch is issued before the actual load. At compile time, in block 306, a prior art compiler determines the prefetch distance using various factors, such as the amount of memory latency the prefetch needs to cover, the amount of work done inside the loop, and the value of the trip count (how many times the loop is repeated). Once the compiler calculates the prefetch distance, a prefetch instruction is inserted into the loop in block 308, the prefetch instruction using the calculated prefetch distance by identifying a memory locating having the appropriate prefetch distance. At execution time, the processor will perform prefetching according to this distance.
There are situations, however, in which the compile-time static prefetch distance calculation described with reference to FIG. 3 is not optimal and may even deteriorate performance. This is caused in general when information needed by the compiler to calculate the appropriate prefetch distance is not available at compile time. For example, the trip count affects the prefetch distance, as described above. However, a loop may have an unknown trip count that is not known until run-time execution. The prior-art method is this situation would either use a guess or default trip-count (which may be wrong on many occasions) or forgo prefetching altogether.