The present invention relates generally to computer systems and programs, and more specifically to an improved scheduling technique for software pipelining.
Software pipelining is a compiler optimization technique for reordering hardware instructions within a given loop of a computer program being compiled, so as to minimize the number of cycles required to execute each iteration of the loop. More specifically, software pipelining attempts to optimize the scheduling of such hardware instructions by overlapping the execution of instructions from multiple iterations of the loop.
For the purposes of the present discussion, it may be helpful to introduce some commonly used terms in software pipelining. As well known in the art, individual machine instructions in a computer program may be represented as “nodes” having assigned node numbers, and the dependencies and latencies between the various instructions may be represented as “edges” between nodes in a data dependency graph (“DDG”). A grouping of related instructions, as represented by a grouping of interconnected nodes in a DDG, is commonly known as a “sub-graph”. If the nodes of one sub-graph have no dependencies on nodes of another sub-graph, these two sub-graphs may be said to be “independent” of each other.
Software pipelining techniques may be used to attempt to optimally schedule the nodes of the sub-graphs found in a DDG. A well known technique for performing software pipelining is “modulo scheduling”. Based on certain calculations, modulo scheduling selects a likely minimum number of cycles that the loops of a computer program will execute in, usually called the initiation interval (“II”), and attempts to place all of the instructions into a schedule of that size. Using this technique, instructions are placed in a schedule consisting of the number of cycles equal to the II. If, while scheduling, some instructions do not fit within II cycles, then these instructions are wrapped around the end of the schedule into the next iteration, or iterations, of the schedule. If an instruction is wrapped into a successive iteration, the instruction executes and consumes machine resources as though it were placed in the cycle equal to a placed cycle % (modulo operator) II. Thus, for example, if an instruction is placed in cycle “10”, and the II is 7, then the instruction would execute and consume resources at cycle “3” in another iteration of the scheduled loop. When some instructions of a loop are placed in successive iterations of the schedule, the result is a schedule that overlaps the execution of instructions from multiple iterations of the original loop. If the scheduling fails to place all of the instructions for a given II, the modulo scheduling technique iteratively increases the II of the schedule and tries to complete the schedule again. This is repeated until the scheduling is completed.
As also known in the art, swing modulo scheduling (“SMS”) is a specific modulo scheduling technique designed to improve upon other known modulo scheduling techniques in terms of the number of cycles, length of the schedule, and registers used. For a more detailed description of SMS, the reader is directed to a paper entitled “Lifetime-Sensitive Modulo Scheduling in a Production Environment” by Joseph Llosa et al., IEEE Transactions on Computers, Vol. 50, No. 3, March 2001, pp. 234-249. SMS has some distinct features. For example, SMS allows scheduling of instructions (i.e. nodes in a DDG) in a prioritized order, and it allows placement of the instructions in the schedule to occur in both “forward” and “backward” directions.
In certain situations, SMS and other known software pipelining techniques may fail to find an optimal schedule. In particular, finding the optimal schedule may be difficult when there are multiple groups of instructions (i.e. sub-graphs) which are independent, and substantially identical in structure (for example, this may result from “unrolling” a loop of a computer program where there are no dependencies between the unrolled iterations). Attempted scheduling of such independent, and substantially identical groups of instructions using known scheduling techniques may result in a cumulative bunching of instructions at various spots within the schedule. This can lead to less than optimal scheduling of loops in terms of the number of execution cycles (i.e. the II). Regions of high register pressure (i.e. register pressure hot spots) also may result.
Thus, an improved scheduling technique which may lower the number of cycles for execution and reduce register pressure hot spots would be desirable.