von Neumann architecture digital computers have a register set for holding various values during operation. The size of the register set may vary. All von Neumann machines have at least a program counter (PC). Generally, there are also several registers for holding operands and results ("operational registers"). RISC (reduced instruction set computer) machines generally have only register-to-register instructions (as distinguished from instructions that directly access memory) except for LOAD and STORE instructions, which read from memory or write to memory but do not operate on the data. They tend to have larger register sets, numbering for example 32 or more registers. Registers are used for holding intermediate results, address indexing, and passing data (parameters) between calling and called procedures. Some processors have floating-point registers in addition to general registers. CISC architectures may have evaluation stacks, thus providing for 0-address operations in which the operands are implicit. RISC architectures usually do not have evaluation stacks. The compiler normally keeps a stack in memory on RISC architectures, primarily for parameter passing and register spills rather than for computation. In most architectures, the overhead of saving and restoring registers on procedure calls is burdensome; it can account for 5% to 40% of main memory references.
More specifically, many computer system's hardware registers are, by convention, partitioned into two sets: (1) callee save registers; and (2) caller save registers. When callee save registers are used by a callee procedure, that procedure is responsible for ensuring that those registers' values appear unchanged to the calling procedure. In other words, the callee procedure is free to use the callee save registers, provided they are restored to their original state before returning from the callee procedure. This may be accomplished in a straightforward manner by saving and restoring all of the callee save registers' values at the callee procedure entry and exit points, respectively. If the callee code referencing any of those registers is conditionally executed, however, that code may not be exercised on certain invocations of the callee procedure. For those invocations, the associated save and restore operations are redundant, and hence represent unnecessary overhead and a potential performance penalty.
To illustrate, FIG. 1 shows a block of lines of code, i.e., instructions in a selected computer programming language. The program 10 includes an instruction CALL A that invokes a subroutine A. Subroutine A is a sequence of lines of codes indicated generally by bracket 20. In this system, for purposes of illustration, we will assume a total of 32 hardware registers, of which registers R3, R4, R5, R10, R11 and R12 are designated as callee save registers. The compiler determines from examining the code in subroutine A that of the callee save registers, registers R3, R4 and R5 are potentially referenced in subroutine A. We say they are "potentially referenced" because references to each of registers R3, R4 and R5 appear somewhere in subroutine A, but it cannot be determined in advance of execution whether or not each of those registers will actually be modified because one or more of the register references may occur in a conditional section of code, such as code block 24 in FIG. 1, which will not be executed under certain circumstances. Since callee save register R3, for example, is set within conditional block 24, it would not be modified unless code block 24 actually executes.
Nonetheless, to ensure that these registers are properly saved, subroutine A saves each of them, as indicated at reference 30, at the beginning of the subroutine. Conversely, subroutine A restores each of registers 3, 4 and 5 to their original states as indicated at reference 32. This series of restores is the last operation before subroutine A returns control to the caller procedure. It should be noted that each individual register save requires a memory access operation, as does each individual register restore. Accordingly, in the example of subroutine A, a total of 6 memory access operations are executed to save and restore the callee save registers even though none of them may actually be modified during execution of subroutine A.
To reduce the likelihood of executing save and restore operations unnecessarily, Fred Chow describes a technique called "shrink-wrapping" which uses data flow information to guide the placement of a registers' save and restore operations. Referring to FIG. 2, subroutine B potentially references callee save register R3. Notice, however, that register R3 is used only within a conditional branch delineated by the IF and ENDIF statements. If the IF statement condition is not met, the block of code in which R3 is used will not be executed. Hence there would be no reason to save that register. According to Chow's procedure, the save and restore operations are moved closer to the actual use of the registers. In FIG. 2, the SAVE R3 operation immediately preceeds the SET R3 statement. ("SET" generically refers to any statement or operation that potentially modifies the value stored in R3.) The restore operation RESTORE R3 appears a little later, but prior to the ENDIF statement. Thus the save and restore operations have been "shrink-wrapped" more tightly around the use of register R3. Since the save and restore operations are now contained within the conditional branch of subroutine B, there will be no wasted cycles to save and restore R3 if this branch is not executed. On the other hand, shrink-wrapping within a loop must be avoided, as a penalty would be paid to save and restore every time the loop is executed. This shrink-wrap procedure is described in considerable detail in "Minimizing Register Usage Penalty at Procedure Calls" by Fred C. Chow, Proceedings of the SIGPLAN '88 Conference on Programming Language Design and Implementation, Atlanta, Ga., Jun. 22-24, 1988, pp. 85-94.
The procedure described by Chow, hereinafter referred to as "basic shrink-wrapping" is not always successful, however. Often, there are code structures in which advantageous save and restore points cannot be found. For example, referring briefly to FIG. 3, if a register is set in code block B and used in code block D, one might attempt to save the register at the beginning of block B and restore at the end of block D. That strategy will fail, however, because if control flows through block C (instead of D), the register will be set but not restored. In such cases, the compiler has no choice but to save and restore callee save registers at the entry and exit points of the called procedures, respectively, as described with reference to FIG. 1. Basic shrink wrapping fails. The need remains, therefore, to reduce procedure call overhead in cases where known techniques fail or leave room for further improvement.