1. Field of the Invention
This invention relates to the field of optimizing compilers for computer systems. Specifically, this invention is a new and useful optimization method, apparatus, system and computer program product for optimizing the order of computer operation codes resulting from the compilation of a program loop.
2. Background
Early computers were programmed by rewiring them. Modem computers are programmed by arranging a sequence of bits in the computer's memory. These bits perform a similar (but much more useful) function as the wiring in early computers. Thus, a modern computer operates according to the binary instructions resident in the computer's memory. These binary instructions are termed operation codes (opcodes). The computer fetches an opcode from the memory location pointed to by a program counter. The computer's central processor unit (CPU) evaluates the opcode and performs the particular operation associated with that opcode. Directly loading binary values in memory to program a computer is both time consuming and mind numbing. Programming languages simplify this problem by enabling a programmer to use a symbolic textual representation (the source code) of the operations that the computer is to perform. This symbolic representation is converted into binary opcodes by compilers or assemblers. By processing the source code, compilers and assemblers create an object file (or object module) containing the opcodes corresponding to the source code. This object module, when linked to other object modules, results in executable instructions that can be loaded into a computer's memory and executed by the computer.
A target program's source consists of an ordered grouping of strings (statements) that are converted into a binary representation (including both opcodes and data) suitable for execution by a target computer architecture. A source program provides a symbolic description of the operations that a computer will perform when executing the binary instructions resulting from compilation and linking of the source. The conversion from source to binary is performed according to the grammatical and syntactical rules of the programming language used to write the source. This conversion from source to binary is performed by both compilers and assemblers.
One significant difference between assemblers and compilers is that assemblers translate source code statements into binary opcodes in a one-to-one fashion (although some "macro" capabilities are often provided). On the other hand, compilers transform source code statements into sequences of binary opcodes (object code) that, when executed in a computer, perform the operation described by the source. Some compilers also provide an option to output the assembler source that represents the object code.
The symbolic statements processed by a compiler are more general than those processed by an assembler and each compiled statement can produce a multitude of opcodes that, when executed by a computer, implement the operation described by the symbolic statement. Unlike an assembler, that maintains the essential structural organization of the source code when producing binary opcode sequences, a compiler may significantly change the structural organization represented by the source when producing the compiled binary. However, no matter how much the compiler changes this organization, the compiler is restricted in that the compiled binary, when executed by a computer, must provide the same result as the programmer described using the source language--regardless of how this result is obtained.
Many modern compilers can optimize the binary opcodes resulting from the compilation process. Due to the design of programming languages, a compiler can determine structural information about the program being compiled. This information can be used by the compiler to generate different versions of the sequence of opcodes that perform the same operation. (For example, enabling debugging capability, or optimizing instructions dependent on which version of the target processor the source code is compiled for.) Some optimizations minimize the amount of memory required to hold the instructions; other optimizations reduce the time required to execute the instructions.
Some advantages of optimization are that the optimizing compiler frees the programmer from the time consuming task of manually tuning the source code. This increases programmer productivity. Optimizing compilers also encourage a programmer to write maintainable code because manual tuning often makes the source code less understandable to other programmers. Finally, an optimizing compiler improves portability of code because source code tuned to one computer architecture may be inefficient on another computer architecture. A general discussion of optimizing compilers and the related techniques used can be found in Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman, Addison-Wesley Publishing Co. 1988, ISBN 0-201-10088-6, in particular chapters 9 and 10, pages 513-723.
FIG. 1 illustrates the general structure of a modern compiler as indicated by a general reference character 100. Such a compiler 100 consumes a target programs' source information 101 by a compiler front-end segment 103. This compiler front end segment 103 processes the syntax and semantics of the source information 101 according to the rules of the programming language applicable to the source information 101. The compiler front end segment 103 generates at least one version of an "intermediate" code representation 104 of the source information 101. For loop constructs, the intermediate code representation generally includes data structures that either represent, or can be used to create, data dependency graphs (DDGs). This intermediate representation 104 is then optimized by an intermediate representation optimizer segment 105. The intermediate representation optimizer segment 105 operates on, and adjusts, the intermediate code representation 104 of the source information 101 to optimize the execution of a program in a variety of ways well understood in the art. The intermediate representation optimizer segment 105 generates an optimized intermediate representation 106. A code generator segment 107 consumes the optimized intermediate representation 106, performs low level optimizations, allocates physical registers and generates an assembler source code and/or object code module 109 from the optimized intermediate representation 106. The object code comprises binary computer instructions (opcodes) in an object module. The assembler source code is a series of symbolic statements in an assembler source language. Both the assembler source code and the object code are targeted to a particular computer architecture (for example, SPARC, X86, IBM, etc.).
DDGs embody the information required for an optimizer to determine which statements are dependent on other statements. The nodes in the graph represent statements in the loop and arcs represent the data dependencies between nodes. In particular, the scope of a variable extends from a "def" of the variable to a "use" of the variable. A def corresponds to an instruction that modifies a variable (an instruction "defines" a variable if the instruction writes into the variable). A use corresponds to an instruction that uses the contents of the variable. For example, the instruction "x=y+1;" "def"s x and "use"s y. An arc in the DDG extends from the def of a variable to the use of the variable. DDGs are described in chapter 4 of Supercompilers for Parallel and Vector Computers, by Hans Zima, ACM press, ISBN 0-201-17560-6, 1991.
As mentioned above, the code generator segment 107 performs low level optimizations and generates either (or both) object code (in the form of object modules) or assembler source code. The intermediate representation of the program generally references virtual registers. That is, the intermediate representation optimizer assumes that the target computer contains an unlimited number of registers. During the operation of the code generator segment 107, these virtual registers are assigned to the physical registers of the target computer. This resource management is performed in the code generator segment 107 by a register allocation (expansion) process. One aspect of the register allocation process is that the contents of physical registers are often "spilled" to memory at various points during the execution of the program so that the limited number of physical registers can be used to hold values of more immediate relevance to the program at those various points. Those values that are spilled to memory are often restored to the registers when the program advances to different points of execution.
One programming construct that can be significantly optimized are single-basic-block-loops (SBB loops). SBB loops have a determinable number of iterations (for example, a compile-time computable or known symbolic tripcount). SBB loops do not contain any control flow structures, functions, procedures, or other constructs that change the flow of execution within the loop. Such loops have only one entry point, one exit point, and no branches within the loop.
Software pipelining is a technique for scheduling the execution of instructions in SBB loops. The software pipelining technique schedules different overlapping iterations of the loop body to exploit the computer's underlying parallel computation units. The execution schedule consists of a prologue, a kernel, and an epilogue. The prologue initiates the first p iterations thus starting each iteration. A steady state is reached after the first p*II cycles, where II is the initiation interval where each initiated iteration is executing instructions in parallel. In this steady state or kernel, one iteration of the loop is completed every II cycles. Once the kernel initiates the last iteration in the loop, the epilogue completes the last p iterations of the loop that were initiated by the kernel.
Some computers contain predicate instructions. Predicate instructions can be used to convert a loop that contains branching opcodes into a SBB loop. For example, a floating point conditional evaluation instruction sets a predicate condition. A floating point "move on predicate condition" instruction evaluates the condition and executes accordingly--but without any branching operation.
FIGS. 2a and 2b illustrate the concepts of SBB loops, and the advantages of using predicate instructions to convert non-SBB loops into SBB loops. FIG. 2a illustrates a non SBB loop as indicated by a general reference character 200. The loop initiates at a code block 201. At the "bne" instruction of the block 201, execution can continue either at a code block 203 or at a code block 205 depending on how the "bne" instruction of the block 201 evaluates its arguments. This branch within the loop violates the SBB loop requirements. If the execution continues to the code block 203, execution must jump past the code in the code block 205. This is another instance that violates the SBB loop requirements. Regardless of which path is taken at the "bne" instruction of the block 201, execution continues at a code block 207. The code block 207 includes instructions that determine whether another iteration of the loop should be executed or whether the loop completes.
FIG. 2b illustrates how predicate instructions can convert the non-SBB loop 200 into a SBB loop as illustrated by a general reference character 210. A code block 211 that is similar to the code block 201 is modified to define a predicate p that is associated with a condition (here the condition is that r1 is not equal to zero). The instructions within a code block 213 are assigned a predicate. A predicate includes an identifier and a type. The predicate for the code block 213 is id=p and type=F (false). Thus while the instruction in the code block 213 will only execute if the predicate condition is false, there is no branching within the loop. This same occurs for a code block 215, except that the required predicate condition for execution is true instead of false. Thus execution sequentially continues through the basic blocks 211, 213, 215 where instructions are conditionally executed dependent on whether the predicate is satisfied. Execution completes at a code block 217 where the predicate p is consumed and the loop is conditionally iterated again. Each of the basic blocks 211, 213, 215, 217 now comprise a SBB loop 219 that can be optimized using existing modulo scheduling methods for SBB loops.
A difficulty with predicate instructions is that there are a limited number of predicate registers and often these registers cannot be spilled to memory and restored. Predicate registers are an example of unspillable resources. Thus, these predicate registers are a resource limitation on the scheduling process of the compiler.