1. Field of the Invention
The present invention relates generally to compiler software pipelining. More particularly, the present invention relates to enhanced modulo scheduling techniques for software pipelining.
2. The Prior Art
A recent trend in processor design is to build processors with increasing instruction issue capability and many functional units. At the same time the push toward higher clock frequencies has resulted in deeper pipelines and longer instruction latencies. To utilize the resources available in such processors it is important to employ scheduling techniques that can extract sufficient instruction level parallelism (ILP) from programs. Modulo scheduling is a known technique for extracting ILP from inner loops by overlapping the execution of successive iterations.
Modulo scheduling is a well known compiler optimization technique that calculates a theoretical minimum initiation interval (minimum II), which is a measure of the execution time, and then producing an instruction schedule using a modulo reservation table which is II cycles in length. If such a schedule can be determined, it is known to be optimal.
In standard modulo scheduling, if an acceptable schedule cannot be found for a minimum II, the value of II is incremented until a schedule can be found. As chips become faster, with more pipeline stages and higher latencies in terms of cycles, the lack of available registers in the instruction set becomes problematic. Fewer loops can be is scheduled with the minimum II, and the minimum acceptable II increase for those loops that cannot be scheduled with a minimum II. Therefore, it becomes harder and more time consuming for the compiler to find the minimum acceptable II or best practical II, resulting in an increase in compilation time.
To overcome these and other shortcomings of the prior art, disclosed herein is an enhanced modulo scheduling technique. Register pressure, or the number of registers required for a given loop schedule, tends to decrease monotonically with increasing II. Hence, it is possible to apply a binary search method to locate the minimal acceptable II in an amount of time which is proportional to the logarithm of the size of the range of attempted IIs, rather than directly proportional to the size of the range itself.
Although the aforementioned monotonically decreasing condition does not necessarily hold for all loops, it nearly always does in practice. When this condition does not hold true, this method will still produce an acceptable schedule, although it may be for an II which is larger than the minimum acceptable II. Usually the same results will be achieved as with the conventional iterative method, i.e. same II provided. However, during compile time, this new method will provide the result in less time.