Scheduling, or instruction rearrangement, has long been a part of optimizing code. It typically involves rearranging instruction order to increase processor performance. Scheduling has the secondary effect of altering which registers can be used without changing the program behavior. Calling conventions require values to be in specific registers. If the instruction that produces a subroutine parameter cannot write directly to the required register, a COPY instruction is needed just prior to the subroutine call. This COPY instruction increases code size and therefore decreases performance. Scheduling can move the defining instruction to just prior to the subroutine, produce the result directly in the required register and avoid the COPY instruction.
Note that expected savings do not come from avoiding the execution of the COPY instruction. This is because a COPY instruction itself uses few CPU resources. Most modem CPUs can issue two integer operations per cycle and the COPY instruction will only have a one-cycle latency. Further, the COPY instruction's one input will have executed many cycles previously (as there has to be an intervening subroutine call or the required register will be legal and the COPY instruction can be avoided).
For larger integer-intensive codes like operating systems, data bases, compilers, and complex GUIs, subroutine calls are a large part of the static instruction count for these programs. For example, with respect to many such large integer-intensive codes, about 10% of the static instruction count for these codes will be COPY instructions. The large number of COPY instructions generated uses valuable space in the instruction cache. This in turn causes I-cache misses which are very expensive. Some of these generated COPY instructions can be avoided by rescheduling.
Scheduling has a long history with much research. This research has been focused on raw performance, generally considering only instruction latency and available parallelism (i.e., superscalar or VLIW). Simplifying assumptions utilized during scheduling include assuming all instructions are in I-cache and that subroutine calls are not that interesting. This later is because subroutine calls typically contain far more work than is being saved by good scheduling in the one block containing the call. Coalescing COPY instructions has not been a focus of this work.
COPY instruction coalescing has typically been done either as a separate compiler pass (called Copy Propagation or Copy Forwarding) or as part of the register allocater in a compiler. Both of these techniques attempt to change any registers defined by a COPY instruction to be the same registers that the COPY instruction is using. If successful, the COPY instruction will move a register onto itself, and can be easily removed. These techniques are global since they can change register assignments across a whole program. However, these techniques do not reorder instructions. If a value is defined before a subroutine CALL, is required in a specific register after the subroutine is entered, and the subroutine "clobbers" that register, then these techniques will require a COPY instruction. This invention attempts to move defining instructions past subroutine CALLs so that their values can be defined directly into the required registers. FIG. 3 below is an example of a COPY instruction that cannot be removed by prior art, but can be removed by this invention.
FIGS. 1 and 2 are dataflow charts used to illustrate COPY instruction elimination and Register Allocation techniques for eliminating COPY instructions. FIG. 1 is an example where COPY Propagation can successfully remove a COPY instruction. Instruction 70 defines register RA. In the example, an ADD instruction is shown as instruction 70. However, this technique works with any instruction that defines register RA. Instruction 72 copies register RA to register RB. Instruction 74 utilizes register RB. In this instance, an ADD instruction is shown as instruction 74 that adds register RB to another register. However, this technique works other instructions than ADD instructions.
COPY elimination operates by looking at a COPY instruction such as shown in instruction 72, and determining whether the source register in COPY instruction 72 is clobbered between instruction 72 and the instruction that uses the COPY instruction result in instruction 74. In the example in FIG. 1, register RA is not clobbered between instruction 72 where it is copied to register RB, and instruction 74 where register RB is utilized. In this example, there is no other use for register RB, and COPY instruction 72 therefore becomes unused. Unused or "Dead" COPY instructions can therefore be eliminated. The resulting code is shown on the right hand side of FIG. 1, where instruction 70 stays the same, generating an output value in register RA. Instruction 74 is modified to replace register RB with register RA resulting in instruction 76. COPY instruction 72 has been eliminated.
Register Allocation generates the same results as COPY instruction elimination in this example, but with a different algorithm. This technique looks for a "live range" for register RA, and a live range for register RB. If the live ranges for register RA and register RB do not interfere, then the two registers can be coalesced. Thus, in the FIG. 1 example, the live range of register RA is between instruction 70 and instruction 72, and the live range of register RB is between instruction 72 and instruction 74. Since there is no interference between the two live ranges, the registers can be coalesced, and the COPY instruction 72 can be eliminated.
The way that COPY instruction coalescing works is that a single combined register RAB is created, replacing registers RA and RB. This turns COPY instruction 72 into a COPY of register RAB into itself, which can be easily eliminated.
FIG. 2 is an example where a COPY Propagation in a forward direction fails to eliminate COPY instructions, but Register Allocation does remove COPY instructions. Register RA is defined in instruction 80. In this example, an ADD instruction is shown, wherein the result is placed in register RA. However, this technique works with other instructions. The resulting value in register RA is copied to register R3 in instruction 82. This resulting value in register R3 is utilized in the call to subroutine FOO in instruction 84. Note that the use of register R3 is implicit here. Whenever subroutine CALLs are generated by higher level language compilers, calling conventions are used. These conventions dictate what registers are used to communicate between the calling and called routines, and what registers are clobbered by the called routine.
In this example, COPY Propagation fails because resulting register RB in FIG. 1 cannot be renamed. This is because register R3 is fixed by the calling convention. In this instance, R3 is shown, since this is part of the PowerPC.TM. calling convention. However, other computer architectures utilize other calling conventions. COPY elimination fails here because it cannot rename register R3 to register RA, due to the calling convention. Note, that COPY Propagation has been shown, which works in the forward direction. Whereas in this example COPY Propagation operating in the backward direction would eliminate the COPY instruction by defining RA instruction in instruction 80 to be R3, a similar counter example is easily generated that fails in the backward direction.
Register Allocation is similar in operation to that shown in FIG. 1. A live range for register RA is identified between instructions 80 and 82. Likewise, a live range is identified for register R3 between instructions 82 and 84. Since the two live ranges do not interfere, the two registers can be coalesced, a combined register RA3 can be utilized in instructions 82 and 84, resulting in a copy of RA3 to register RA3 in instruction 82, which can be easily eliminated. The result is that instruction 80 with the result in RA is replaced by instruction 86 which is identical except that its result is left in register R3. The result left in R3 can then be used correctly by the call to subroutine FOO in instruction 84.
FIG. 3 is a dataflow chart illustrating an example where COPY instructions cannot be eliminated by either COPY Propagation or register allocation. Register RA is defined in instruction 90. As with step 70, an ADD instruction is shown in instruction 90. However, this technique will work with other instructions. Subroutine FOO is called in instruction 92. This has the result of clobbering all the registers in the calling convention. This is followed by a copy of register RA to register R3 in instruction 94. Finally, register R3 is used in a call to BAR in instruction 96.
COPY Propagation fails with the FIG. 3 example for the same reason that it failed with the FIG. 2 example, which is that register R3 cannot be renamed, due to its being part of the calling convention. Register allocation fails in this example because the live range for register RA interferes with the live range for register R3. This is because the live range for register RA is between instructions 90 and 94 but the live range of R3 is not just between instructions 94 and 96, but also includes the call to subroutine FOO at instruction 92. This is because register R3 is clobbered by the calling convention in the call to subroutine FOO at instruction 92. Due to this interference between the live ranges of the two registers, register allocation cannot coalesce register R3 and RA.
It would be advantageous to be able to remove COPY instructions in instances where neither COPY Propagation nor Register Allocation is effective in removing such COPY instructions.