Optimizing compilers are software systems for translation of programs from higher level languages into equivalent object or machine language code for execution on a computer. Optimization generally requires finding computationally efficient translations that reduce program runtime. Such optimizations may include improved loop handling, dead code elimination, software-pipelining, better register allocation, instruction prefetching, and/or reduction in communication cost associated with bringing data to the processor from memory.
Certain programs would be more useful if appropriate compiler optimizations are performed to decrease program runtime. A number of compilation techniques have been developed to improve the efficiency of loop computations by increasing instruction-level parallelism (ILP). One such method is software-pipelining (SWP), which improves the performance of a loop by overlapping the execution of several independent iterations. The number of cycles between the start of successive iterations in SWP is called the Initiation Interval (II), which is the greater of the resource II and the recurrence II. The resource II is based on the resource usage of the loop and the available processor resources. The recurrence II of the loop is based on the number of cycles in the dependence graph for the loop and the latencies of a processor. Maximum instruction level parallelism for the loop is realized if the recurrence II of the loop is less than or equal to its resource II.
A process architecture, such as an Intel® Itanium® architecture, may provide special features for SWP of loops, such as large sets of rotating registers, special branch, and other instructions that enable efficient SWP of loops. Rotating integer and floating-point registers enable SWP without the need to generate MOV instructions. A MOV instruction or operation is used to move values from one register to another register. Rotating predicate registers help in the generation of compact code for software-pipelined loops, without the need to generate explicit prologs and epologs for pipelined loops. Such SWP features can be useful to improve performance on applications that are loop-intensive.
However, in spite of these features and the large register sets for integer, floating-point, and predicates available in such processors, there are instances in which large loops may run out of rotating registers. In such situations, these loops don't get pipelined and the performance suffers. One possible solution to this problem is to change the schedule and increase the scheduled II. This can lead to a reduction in the number of required rotating registers and thus allows a successful allocation of rotating registers in such loops. A possible drawback is that it increases II, compile-time of SWP, and the complexity in implementation of SWP.
Another approach used to solve the above-described problem is to spill and fill rotating registers in software-pipelined loops. The traditional approach to spilling and filling non-rotating registers is to use unique memory locations for each register it is desired to spill and fill. The spilling and filling of non-rotating registers reduces the live ranges and register pressure thus enabling register allocation. Live range means a computation that is held in a register until it is needed. Unfortunately, this approach does not work for spilling and filling of rotating registers because the lifetimes of the rotating registers can be typically greater than II (lifetimes less than II are usually assigned to non-rotating registers). Accordingly, there is a need for an efficient spilling and filling of rotating registers in software-pipelined loops.