Embodiments of the present invention relate to the field of computer systems. In particular, embodiments of the present invention relate to a method, apparatus, and computer program product for reducing the execution time of instructions in a software loop.
A compiler is a program that reads a source code (in a source language) and translates it into a target code (in a machine language). The compiler, while formulating the target code, forms an intermediate code (in a machine-independent form). For example, a compiler for FORTRAN translates a high-level source code in the FORTRAN programming language into a target code in machine language, which can be executed by a computer processor. In addition, the compiler for FORTRAN, while forming the target code, forms an intermediate code. Machine-independent optimizations may be performed on the intermediate code.
Conventional compilers include three stages—a front end, a middle end and a back end. The front end translates the source code into the intermediate code. The middle end optimizes the intermediate code by using machine-independent optimizations. The back end generates the target code, which is optimized by using machine-dependent optimizations.
Optimization of the intermediate code refers to the transformation of the intermediate code into an alternative functionally equivalent code with reduced execution time. The execution time of the source code depends on a number of factors. These factors include the number of instructions required to execute the source code, the average number of processor cycles required to execute an instruction, and the processor cycle time.
Various methods have been used for optimizing intermediate codes in the machine-independent form in the compiler. These methods facilitate reduction of the height of instructions in a basic block of a software loop. The basic block is a straight-line piece of code without any jumps in the middle of the block.
The bulk of the execution time in a program is usually spent in software loops. Therefore, speeding up the execution of these loops can save execution time. Some loops are resource-bound, i.e., they are bound by the number of issue and instruction slots available for their instructions. However, many other loops are recurrence-bound, i.e., they are limited in performance by the availability of results from an earlier iteration. Traditionally, predicate promotion was used to reduce the height of a computation. This meant that it was used to reorder the computation within a basic block, so that a given instruction could be executed earlier than before. These techniques, though useful for speeding up computation in acyclic regions, may not always speed up the execution of loops. What is important is to reduce the critical recurrence cycles and not just reduce their heights within their basic blocks.