Loop distribution is to distribute a loop of a greater size present in an application program into a plurality of loops. For example, high performance computing (HPC) application programs executed by supercomputers have loops of a greater size. Most of the execution time is often spent by such a loop. A loop of a greater size generally has a greater number of cache misses. The reasons are as follows. First, the loop of a greater size has many instructions and thus tends to suffer cache misses in instruction caches. Second, the loop of a greater size often has many variables, and in that case, tends to suffer many cache misses in data caches.
Thus, cache misses and the execution time of the program may be reduced by performing optimum loop distribution on the loop of a greater size. However, the execution of the loop distribution may result in more cache misses than when the loop distribution is not performed.
On the other hand, a compiler has a function to optimize application programs. The optimization is known to include, for example, parallelization of instructions, a single instruction multiple data (SIMD) function to change a plurality of the same instructions into a single instruction, software pipelining in which instructions are reordered so as to be pipelined by a plurality of computing units, and loop unrolling in which a plurality of loops is changed into a single loop to eliminate overhead processing time associated with the plurality of loops.
The optimization enables reduction in the execution time of the program. However, the loop distribution may allow each of the distributed loops to be more appropriately optimized. Furthermore, the optimization enables reduction in program execution time, but the program execution time may further be reduced by optimizing the program with cache misses reduced as a result of the loop distribution.
Thus, the loop distribution only cannot minimize the execution time of the program. Finding the optimization only cannot find a solution for the minimization of the program execution time.
The loop distribution is disclosed in Japanese Patent Application Laid-open No. 2009-104422, WO 98/19249, Japanese Patent Application Laid-open No. H6-250846, Japanese Patent Application Laid-open No. 2001-5792, and Ikuo Nakano, Structure of Compiler and Optimization (second edition), Asakura Publishing Co., Ltd., 2009.