1. Field of the Invention
This invention relates in general to compiler algorithms for reducing address computation overhead and, in particular, to compiler algorithms for reducing address computation overhead for microprocessor or computer system architectures that support an auto-increment feature for memory access instructions.
2. Description of the Prior Art
Conventional microprocessor and computer architectures typically provide a base+displacement addressing mode. A limited number of computer architectures provide an addressing mode that supports a base register auto-increment feature. Examples of computer architectures which support a base-register auto-increment addressing mode include: the IBM 370, the Digital PDP-11/VAX architectures, the Apollo PRISM architecture, the IBM R/S 6000 architecture, and the PA-RISC architecture.
FIG. 1 shows two examples of memory access instructions that specify the autoincrement addressing mode. Example 1 of FIG. 1 is an example of an auto-increment memory access instruction where the base register auto-increment occurs after the memory access. The load instruction results in the contents of the memory location specified by base register R10 being loaded into register R11. Only after the load into register R11 are the contents of base register R10 "post-incremented" by 4. Hence, the ",MA" (modify after) completer on the load opcode. In contrast, the instruction in Example 2 is an example of a memory access instruction where the base register auto-increment occurs before the memory access. In Example 2, the contents of base register R9 are "pre-incremented" by -4 before the store occurs. After R9 is pre-incremented by -4, the contents of register R8 are stored in the memory location specified by base register R9. Hence, the ",MB" (modify-before) completer on the store opcode. Without loss of generality, we will refer to the post-increment addressing mode in the remainder of this application. The observations and algorithms herein apply equally well to the use of the pre-increment addressing mode.
The compiler is software that translates source code written in a high-level programming language, such as C, BASIC, or FORTRAN, into a binary image that runs on the computer hardware. It is the compiler's job to take advantage of the powerful features of the computer architecture, such as the post-increment addressing mode, to reduce the address computation overhead in order to increase computer performance. FIG. 2 shows a block schematic diagram of a software compiler. Referring to FIG. 2, the front end is responsible for checking the syntactic correctness of the source code. For example, if the compiler is a C compiler, it is necessary to make sure that the code is legal C code. The compiler front end component 200 reads a source code file 202 and translates it into a high level intermediate representation 210. A high level optimizer 222 may be optionally invoked to optimize the high level intermediate representation 210 into a more efficient form.
The code generator 230 translates the high level intermediate representation 210 to a low level intermediate representation 232. The low level intermediate representation generated by a code generator is typically fed into a low level optimizer. The low level optimizer 234 converts the low level intermediate representation 232 into a more efficient (machine-executable) form. The object file generator 250 writes out the optimized low-level intermediate representation into an object file 252. The object file 252 is processed along with other object files 254 by a linker 260 to produce an executable file 262 which can run on the computer 264.
Both the high level and low level optimizer components 222, 234 of the compiler must preserve the program semantics (i.e. the meaning of the instructions that are translated from source code to a high level intermediate representation, and thence to a low level intermediate representation and ultimately an executable file), but may transform the code in a way that allows the computer to execute an "equivalent" set of instructions in less time. Modern compilers are structured with a high level optimizer (HLO) that typically operates on a high level intermediate representation and substitutes in its place a more efficient high level intermediate representation of a particular program. For example, an HLO might eliminate redundant computations. With the low level optimizer (LLO), the over-arching objectives are largely the same as the HLO, except that the LLO operates on a representation of a program that is much closer to what the machine actually understands.
Memory accesses that occur in loops which specify an address that is a linear function of a loop induction variable are prime candidates for exploiting a post-increment addressing mode. FIG. 3A shows an example of C source code where the memory accesses occur in a loop and where the address is a linear function of a loop induction variable. The memory accesses reference a 10 element global array variable, A, where each element is assumed to be four bytes in size. FIG. 3B shows a series of memory access operations for implementing the C source code shown in FIG. 3A. The numbers to the left of the memory operations indicates the program order of the memory operations.
Typically, compiler algorithms attempt to exploit the auto-increment capabilities of memory operations that occur in loops in the loop optimization phase. FIG. 3C shows the result of transforming the memory operations in FIG. 3B to use the a post-increment addressing mode where a different base register is used for each memory access instruction. FIG. 3D shows a hypothetical preferred scheduling order for the operations shown in FIG. 3B. Note that the preferred order of instructions is different than the order of the instructions shown in FIG. 3B. FIG. 3E shows the instructions shown in FIG. 3C in the preferred scheduling order shown in FIG. 3D. Although the code transformation shown in FIG. 3C (using the post-increment addressing mode with a different base register for each memory access instruction) allows the instructions to be reordered into the preferred sequence shown in FIG. 3D, it does so at the cost of using an increased number of base registers.
FIG. 3F shows the result of transforming the memory operations of FIG. 3B using post-increment memory access instructions where the same base register is used for each memory access instruction. The transformed instructions shown in FIG. 3F cannot be reordered to achieve the preferred scheduling order due to data dependencies on the common post-incremented base register R.sub.p.
FIG. 4 is a block diagram showing a low level optimizer 234. In a computer architecture that supports an auto-increment addressing mode, the loop optimization phase of a compiler includes the steps of (1) identifying opportunities for post-increment synthesis 272 and (2) transforming candidate memory instructions into auto-increment memory operations 274. Post-increment synthesis is performed as part of a loop optimization phase that operates on a low-level representation of the source program. The synthesis performed in the loop optimization phase typically precedes the instruction scheduling phase 278 of the optimizer.
A problem with synthesizing post-increment instructions prior to instruction scheduling is that register data dependencies are introduced between memory access instructions that refer to shared (post-incremented) base register operands. (See code sequence shown in FIG. 3F). These register data dependencies can adversely affect the quality of the instruction schedule that can be achieved if those memory access instructions were not otherwise dependent on each other. Specifically, the order in which memory access instructions that share a common post-incremented base register appear in the final schedule is constrained. In addition, for instructions that end up sharing a common post-incremented base register, opportunities to schedule otherwise independent memory access instructions in the same processor clock cycle may be lost. These problems are exacerbated in loops that are subject to software pipelining since the sequence of base-register post-increment operations will typically form a cycle and thereby constrain the intermingling of memory operations from different loop iterations in the modulo-scheduled kernel.
Address computation overhead can be a performance limiter especially for an instruction-set architecture where memory addresses are specified for loads and stores by a single base register operand. The performance impact is more pronounced for implementations of such architectures with modest amounts of AI,U bandwidth which can be used for address computation. Some architectures provide a simple base register-indirect addressing mode, with an optional post-increment feature. Thus, for such architectures for codes that have a rich mixture of integer ALU operations, it is important for the compiler to reduce ALU use for address computation by making effective use of the post-increment feature. Further, the compiler algorithm for exploiting post-increment addressing mode should operate in a way that does not unduly constrain the instruction scheduler.
A compiler optimization algorithm that reduces address computation overhead for architectures that support an auto-increment addressing mode for memory access instructions is needed. Further, the compiler algorithm should avoid the drawbacks of traditional approaches to post-increment synthesis and in addition should provide support for memory references occurring in the software pipelined loops.