1. Field of the Invention
The present invention relates generally to a software compiler system, and more particularly to a system and method in a software compiler system for aggregate instruction movement using a target description table.
2. Related Art
Most modern processors benefit from (or require) the rearrangement of operations to avoid inter-locks between instructions. For earlier processors, applying such scheduling rearrangement to a single basic block at a time is adequate. But with the advent of superscalar and VLIW (very large instruction word) architectures, instruction level parallelism (ILP) available at the basic block level is not sufficient to fully exploit the hardware resources available. Instead, code must be rearranged-beyond basic block boundaries to achieve higher ILP. This activity is called global scheduling, since most such algorithms incorporate scheduling.
Global scheduling specifically for inner loop bodies has been worked on extensively, with software pipelining in particular being in some cases an acceptable solution. This is described, for example, in Charlesworth, A. E., "An Approach to Scientific Array Processing: The Architectural Design of the AP- 120B/FPS-164, "IEEE Computer 14(9): 18(1981). This is also described in Dehnert, J. C. and Towle, R. A., "Compiling for the Cydra 5," J. Supercomputing 7(1/2):181-227 (1993), which is herein incorporated by reference in its entirety. But global scheduling is also important outside loop bodies, and for dealing with those inner loop bodies which cannot be pipelined. Several approaches to this problem have been described in the literature, as will now be described.
Trace scheduling reduces the problem to a local scheduling problem by scheduling a trace (an acyclic path) in the flowgraph and allowing operations to move past branches or labels within the trace. Fix-up code is then inserted in the basic blocks that branch into the middle of traces (or are branched to form within traces) to correct for changes due to such movement past the branches. Trace scheduling handles loops by breaking traces at back arcs, depending on unrolling to mitigate the resultant inability to move code past those arcs. Trace scheduling is described, for example, in Lowney, P. G. et al., "The Multiflow Trace Scheduling Compiler," J. Supercomputing 7(1/2):51-142 (1993); Ellis, J., Bulldog: A Compiler for VLIW Architectures, MIT Press, Cambridge, Mass. (1986); and Fisher, J. A., "Trace Scheduling: A Technique for Global Microcode Compaction," IEEE Transactions on Computers C-30(7):478-490 (1981).
Percolation scheduling is a greedy algorithm which increases ILP by moving operations upward as much as possible. Because it ignores resource requirements, operations that are executed with small probability consume resources that could otherwise perform useful work. Percolation scheduling is described, for example, in Nicolau, A., A Fine-Grain Parallelizing Compiler, Tech. Report No. 86-792, Cornell Univ. (1986).
Enhanced percolation scheduling addresses this problem by delaying movement of operations until scheduling time. This postpones the movement decisions until actual machine resource requirements are known, restraining movement of operations that would exceed resource availability. Enhanced percolation scheduling is described, for example, in Ebcioglu, K. and Nicolau, A., "A Global Resource-Constrained Parallelization Technique," Proceedings 3rd Int'l Conf. Supercomputing, pp. 154-163 (1989).
Global instruction scheduling permits the equivalent and speculative movement of operations beyond basic block boundaries within the scope of an enclosing loop. An enhanced block scheduler improves the ILP of a basic block by considering operations from its "neighbor and peer" blocks. Code duplication is avoided in their initial implementation, but loops are handled by copying the first basic block to the end of the loop. Global instruction scheduling is described, for example, in Bernstein, D. and Rodeh, M., "Global Instruction Scheduling for Superscalar Machines," Proc. SIGPLAN '91 Conf. Programming Language Design & Implementation, pp. 241-255 (1991).
All of these techniques move a single operation at a time, either explicitly or implicitly by scheduling it outside its original basic block. This limits their ability to make truly global tradeoffs in deciding where to place operations. It increases the cost of compiler decision-making by making the choices for each operation, and often requires update of dependency information after each decision. It also introduces significant biases into the decisions which are made.
In particular, trace scheduling optimizes the first traces scheduled at the cost of fixup code on the side traces, even if they have equal execution frequency. Percolation scheduling moves operations upward even if doing so is detrimental. Enhanced percolation scheduling suppresses motion which exceeds resource availability, but will still do useless motion which increases resource requirements. Global instruction scheduling, like enhanced percolation scheduling, constrains motion based on available resources, but cannot balance resource usage.