Early computers were programmed by rewiling them. Modern computers are programmed by arranging a sequence of bits in the computer's memory. These bits perform a similar (but much more useful) function as the wiring in early computers. Thus, a modern computer operates according to the binary instructions resident in the computer's memory. These binary instructions are termed operation codes (opcodes). The computer fetches an opcode from the memory location pointed to by a program counter. The computer's central processor unit (CPU) evaluates the opcode and performs the particular operation associated with that opcode. Directly loading binary values in memory to program a computer is both time consuming and mind numbing. Programming languages simplify this problem by enabling a programmer to use a symbolic textual representation (the source code) of the operations that the computer is to perform. This symbolic representation is converted into binary opcodes by compilers or assemblers. By processing the source code, compilers and assemblers create an object file containing the opcodes corresponding to the source code. This object file, when linked to others, results in executable instructions that can be loaded into a computer's memory and executed by the computer.
A source program consists of an ordered grouping of strings (statements) that are converted into a binary representation (including both opcodes and data) suitable for execution by a target computer architecture. A source program provides a symbolic description of the operations that a computer will perform when executing the binary instructions resulting from compilation and linking of the source. The conversion from source to binary is performed according to the grammatical and syntactical rules of the programming language used to write the source. This conversion from source to binary is performed by both compilers and assemblers.
One significant difference between assemblers and compilers is that assemblers translate source code statements into binary opcodes in a one-to-one fashion (although some "macro" capabilities are often provided). On the other hand, compilers transform source code statements into sequences of binary opcodes that, when executed in a computer, perform the operation described by the source. The symbolic statements processed by a compiler are more general than those processed by an assembler and each compiled statement can produce a multitude of opcodes that, when executed by a computer, implement the operation described by the symbolic statement. Unlike an assembler, that maintains the essential structural organization of the source code when producing binary opcode sequences, a compiler may significantly change the structural organization represented by the source when producing the compiled binary. However, no matter how much the compiler changes this organization, the compiler is restricted in that the compiled binary, when executed by a computer, must provide the same result as the programmer described using the source language--regardless of how this result is obtained.
Many modern compilers can optimize the binary opcodes resulting from the compilation process. Due to the design of programming languages, a compiler can determine structural information about the program being compiled. This information can be used by the compiler to generate different versions of the sequence of opcodes that perform the same operation. (For example, enabling debugging capability, or optimizing instructions dependant on what version of the target processor the source code is compiled for.) Some optimizations minimize the amount of memory required to hold the instructions, other optimizations reduce the time required to execute the instructions. The invention disclosed herein optimizes so as to maximize execution speed for a particular type of loop operation.
Some advantages of optimization are that the optimizing compiler frees the programmer from the time consuming task of manually tuning the source code. This increases programmer productivity. Optimizing compilers also encourage a programmer to write maintainable code because manual tuning often makes the source code less understandable to other programmers. Finally, an optimizing compiler improves portability of code because source code tuned to one computer architecture may be inefficient on another computer architecture.
Compilers generally have three segments: (1) a front-end that processes the syntax and semantics of the language and generates at least one version of an "intermediate" code representation of the source; (2) a back-end that converts the intermediate code representation into binary computer instructions (opcodes) for a particular computer architecture (i.e., SPARC, X86, IBM, etc.); and (3) various code optimization segments between the front- and back-ends of the compiler. These optimization segments operate on, and adjust, the intermediate code representation of the source. For loops, the intermediate code representation generally includes data structures that either represent, or can be used to create, data dependency graphs (DDGs). DDGs embody the information required for an optimizer to determine which statements are dependent on other statements. The nodes in the graph represent statements in the loop and arcs represent the data dependencies between nodes. Data dependency graphs are described in chapter 4 of Superconpilers for Parallel and Vector Computers, by Hans Zima, ACM press, ISBN 0-201-17560-6, 1991.
One example of a prior art optimization is for the compiler to process the source code as if the programmer had written the source in a more efficient manner. For example, common subexpression elimination replaces subexpressions that are used more than once with a temporary variable set to the subexpression's value. Thus:
a=i*2+3; PA1 b=sqrt(i*2); PA1 temp=i*2; PA1 a=temp +3; PA1 b=sqrt(temp); PA1 while (!feop(fp)) DoSomething (fp, x*5); PA1 temp=x*5; PA1 while (!feof(fp)) DoSomething (fp, temp); PA1 a=0; PA1 for (i=0; i&lt;100; i++) a=a+i; PA1 a=0; PA1 for (i=1; i&lt;6; i++) a=a+i;
compiles as if written as:
Another optimization is by code motion. This optimization hoists, from out of the enclosing loop, expressions that are loop-invariant for each iteration. Thus:
compiles as if written as:
Yet another optimization (trading memory for speed) is to expand the expressions contained in the loop so that less time is spent performing loop overhead operations. For example:
can be compiled as if written as:
______________________________________ a = 0 for (i = 0; i &lt; 100; i + = 5) { a = a + i; a = a + i + 1; a = a + i + 2; a = a + i + 3; a = a + i + 4; } ______________________________________
Finally, the compiler could just unwind iterations from the loop so that there would be no loop overheads. Thus,
can be compiled as if written as:
______________________________________ a = 0; { a = a + 1; a = a + 2; a = a + 3; a = a + 4; a = a + 5; } ______________________________________
In the above examples, the plus "+" is used to indicate any language operator. Further, there are other optimizations that could be performed on each example. The above examples indicates the operation of each separate optimization. In an optimizing compiler, many optimizations are applied to best optimize the resultant executable instructions.
Of course, the original source code is not modified by the compiler, rather the compiler sufficiently understands the program structure, as described by the source, to make these optimizations for the programmer without changing the programmer's intended result. These optimizations result in the production of binary instructions that execute faster in the target computer than the non-optimized instructions that perform the same operations.
It is well understood in the art how to hoist loop-invariant operations out of a loop operation. A general discussion of optimizing compilers and the related prior art techniques can be found in Conipilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi and Jeffery D. Ullman, Addison-Wesley Publishing Co., 1988, ISBN 0-201-10088-6, hereinafter Aho. Optimization of loop-invariant computations is described in Aho at pages 638-642.
However, it is not known to the art how to hoist certain statements that are almost loop-invariant. In particular, some compiler source sequences are not initially loop invariant, but become invariant after a given number of iterations. For example:
______________________________________ x = 1; for (i = 0; i &lt; = 10; i++) { y = x * 5; //inst 1 x = 2; //inst 2 } ______________________________________
Here, on the first iteration of the loop, y receives the value of (1*5=5), and x receives the value 2. During the second iteration, y receives the value (2*5=10), and x again receives value 2. On the third iteration and continuing until the end of the loop, y receives the same value (10). In this example, the second statement of the loop "x=2; " is not loop-invariant because the value of x that reaches "inst 1" in the first iteration is different than the value of x that reaches "instr 1" in subsequent iterations. Thus, the prior alt cannot hoist the operations representing this statement from the loop's kernel. The same applies to variable y. However, x does not change subsequent to the first iteration over the loop. Thus, x and y become invariant after the first iteration. Keeping the operations represented by these statements within the loop's kernel, after the first iteration, slows execution of the loop because each execution of an instruction takes time and computer resources. After the initial variant iterations of the loop, these statements have no further utility. Thus the time spent processing these statements is wasted and the loop is correspondingly inefficient.
Statements that have assignments that are variant in the first omega iterations, but that become invariant after omega iterations are termed "omega-invariant". Omega-invariant statements "stabilize" after "omega" iterations. Hence, the value delivered by the statement stabilizes after omega iterations and becomes invariant for subsequent iterations. In the example above, the omega for "inst 1" is one because the value of y remains the same (10) after one iteration of the loop. The omega for "instr 2" is zero because it is assigned the value of 2 in the first and all subsequent iterations of the loop.
The prior art does not optimize this sequence of statements. Thus, prior art compilers generate less efficient (slower) loops than compilers practicing the invention. The invention described herein optimizes the execution speed of SBBN loops containing omega-invariant statements with a determinable number of iterations.