1. Field of the Invention
The present invention generally relates to computer program code optimization, and more particularly to a system and method for supporting code optimization through the deferral of exceptions generated during speculative execution.
2. Discussion of the Related Art
As is known, the performance of a computer system may be enhanced by optimizing the code of a computer program so that the computer can execute the program more quickly. One of the steps in optimizing a program is a process called scheduling. Scheduling is a process where the series of computer operations that comprise a program are organized for execution. During the scheduling process, operations of the program may be arranged, eliminated, or moved to make the program run more efficiently for a particular CPU design. Generally, there are two forms of scheduling: dynamic scheduling performed by the hardware during execution of a program, and static scheduling performed by a compiler before execution. Either of these techniques, or a combination of both, may be used to schedule operations in a computer program for processing by a computer system.
A computer program consists of a series of instructions to be carried out by a central processing unit (CPU) in the computer system. A typical program is written in a high level language and then compiled into a series of instructions compatible with the instruction set architecture of the CPU. A program, however, may also be directly written in "machine language" according to the instruction set architecture of the computer. The instruction set architecture defines the format or encoding of operations, including operators and operands in an instruction. Depending on the structure of the CPU and the scheduling techniques involved, each instruction may have one or more operations. An operation includes an operator encoded in an opcode representing functions such as add, subtract, load, store, branch, etc. Additionally, an operation identifies the operands and the results of the operation. To accomplish this, the operation typically includes a code identifying the location such as a register of an operand or operands. It is these operations that are organized for execution by the CPU using the optimization techniques.
There are different levels of optimization. One level of optimization is local optimization where code within a straight-line code fragment or "basic block" is manipulated to run more efficiently. By way of definition, a "basic block" is a contiguous set of instructions bounded by branches and/or branch targets, containing no branches or branch targets. This implies that if any instruction in a basic block is executed, then all instructions in the basic block will be executed, i.e. the instructions contained within any basic block are executed on an all-or-nothing basis. The instructions within a basic block are enabled for execution when control is passed to the basic block by an earlier branch targeting the basic block ("targeting" as used here includes both explicit targeting via a taken branch as well as implicit targeting via a not taken branch). The foregoing implies that if control is passed to a basic block, then all instructions in the basic block must be executed; if control is not passed to the basic block, then all instructions in the basic block must not be executed. The act of executing, or specifying the execution of, an instruction before control has been passed to the instruction is called "speculation." Speculation performed by the processor at program runtime is called "dynamic speculation" while speculation specified by the compiler is called "static speculation." Dynamic speculation is known in the prior art.
Two instructions are deemed "independent" when one does not require the result of the other; when one instruction does require the result of the other they are termed "dependent" instructions. Independent instructions may be executed in parallel while dependent instructions must be executed in serial fashion. Program performance is improved by identifying independent instructions and executing as many of them in parallel as possible. Experience indicates that more independent instructions can be found by searching across multiple basic blocks than can be found by searching only within individual basic blocks, however, simultaneously executing instructions from multiple basic blocks generally requires speculation. Identifying and scheduling independent instructions, and thereby increasing performance, is one of the primary tasks of compilers and processors.
The trend in compiler and processor design has been to increase the scope of the search for independent instructions in each successive generation. In prior art instruction sets, an instruction that may generate an exception cannot be speculated by the compiler since, if the instruction causes an exception, the program may erroneously generate an exception when the program should not have. This restricts the useful scope of the compiler's search for independent instructions and makes it necessary for speculation to be performed at program runtime by the processor via dynamic speculation. However, dynamic speculation entails a significant amount of hardware complexity, furthermore, the complexity increases exponentially with the number of basic blocks over which dynamic speculation is applied--this places a practical limit on the scope of dynamic speculation. By contrast, the scope over which the compiler can search for independent instructions is much larger--potentially the entire program. Furthermore, once the compiler has been designed to perform static speculation across a single basic block boundary, very little additional complexity is incurred by statically speculating across several basic block boundaries.
Examples of local optimization techniques are common subexpression elimination and constant propagation. Another level of optimization is global optimization which includes extending local optimization techniques across conditional branches in a program and further includes transformations for optimizing loops. One form of global optimization is code motion. An example of code motion is removing code from a loop that computes the same value each iteration of a loop. A third level of optimization is machine dependent optimization. Machine dependent optimization involves manipulation of code to take advantage of specific architectural attributes of the CPU. For example, if the CPU has a pipelined functional unit for executing instructions concurrently, then code can be reordered to improve pipeline performance.
To optimize a program, code may be moved above a conditional branch in a scheduling process called speculative code motion. Speculative code motion refers to the movement of an instruction above a conditional branch that controls its execution. The execution of a "speculative" instruction may be referred to as speculative or anticipatory execution because the instruction is executed before it is known whether the instruction will actually be used in the program. Speculative code motion can enhance instruction level parallelism. Because many instructions have a long latency, meaning they take several clock cycles to execute, it is advantageous to execute an instruction speculatively. They delay that an instruction would otherwise cause can be minimized by issuing the instruction in advance. Speculative code motion may also be useful in other optimizations such as redundancy elimination.
If static speculation is to be undertaken, then several problems must be solved, one of the most important of which is the handling of exceptional conditions encountered by statically speculated instructions.
Since, as noted above, exceptions on speculative instructions cannot be delivered at the time of execution of the instructions, a compiler-visible mechanism is needed to defer the delivery of the exceptions until control is passed to the basic block from which the instructions were speculated (known as the "originating basic block"). Mechanisms that perform a similar function exist in the prior art for deferring and later delivering exceptions on dynamically speculated instructions, however, by definition the mechanisms are not visible to the compiler and therefore cannot be manipulated by the compiler into playing a role in compiler-directed speculation. No known method or apparatus for deferring and later delivering exceptions on statically speculated instructions has been enabled in the prior art. Limited forms of static speculation do exist in the prior art, however: (1) the forms do not involve deferral and later recovery of exceptional conditions, and (2) the forms do not enable static speculation over the breadth and scope of the present invention.
Another example of prior art limited static speculation is the speculation of instructions that do not cause exceptions. For example, typically the compare instruction is defined such that it does not generate any exceptions. A properly designed compiler may then speculate the compare since the only side effect is the writing of a destination. In the event that control is not passed to the compare's originating basic block, the destination is simply discarded. Another example is a load instruction from an address that is known to be valid at compile time and known to remain constant during runtime, e.g. a global variable. These conditions guarantee that if any exceptions do occur, they will not be fatal and can be handled speculatively without side effects--although the handling of the speculative exceptions may reduce overall performance. Again it should be noted that the limited forms of speculation just described do not involve or allow deferral and only apply to a restricted class of instructions.
Therefore, when undertaking static speculation, there is a need in the art to enable a mechanism to defer exceptions on speculative instructions that applies to as many forms of speculation as possible. The mechanism must posses very low latency otherwise the performance of a program compiled with speculation may actually be lower than the same program compiled without speculation. The mechanism must also place minimal restrictions on the form and the construction of software in order to allow the execution of legacy software, to minimize the impact on software developers, and to maximize the range of software implementation choices. A desired characteristic of the mechanism is to allow the computer system to dynamically adapt to program behavior in order to maximize performance over the broadest possible range of software.
Other methods are also known to deal with exceptions generated during speculative execution. One conservative approach is referred to as "safe speculation." In this approach, only operations that do not generate exceptions are moved speculatively. This approach does not improve instruction level parallelism sufficiently because it precludes speculative motion of many operations. Moreover, it does not allow load operations to be executed speculatively, and therefore, does not have the benefit of hiding memory latency.
Another alternative approach is referred to as boosting. In this approach, a speculative operation is tagged with the path back to its home basic block. To defer an exception, this state information must be saved until the processor takes a different execution path or it uses the result of the operation in a non-speculative operation.
The need to save this state information is a drawback of the boosting technique. Additional memory is required to store this state information. This gives rise to a trade off between the extent to which boosting can be achieved and the additional opcode bits required to store the branch directions. The number of branches that an operation can be moved across is limited by the memory available to store the state information.
Another approach involves the use of a poison bit to defer exceptions. In this approach, the processor marks the result register of a speculative operation with a poison bit when an exception has been generated. When another speculative operation uses the result of this operation, the processor can propagate the exception by setting a poison bit in the result register of the operation. Processing of the exception is deferred until a non-speculative operation consumes the poison bit. At that point, the processor can report or process the exception.
The poison bit approach typically requires that an extra bit be added to the opcode of speculative operations in the instruction set architecture to differentiate between speculative and non-speculative operations. This is a drawback because it increases the complexity of the instruction set and requires additional memory in the register file. In addition, the poison bit must be saved when a register is spilled at a function call or context switch. It is difficult to save the poison bit because a register that holds 64 bits of data, for example, needs to be spilled to 65 bits of memory.
Yet another approach is referred to as tagging. In this approach, each operation has a tag associated with it. Typically, a tag of zero indicates that the operation is non-speculative. For speculative operations, the tag refers to memory in the processor such as a tag table that stores information about deferred exceptions. In this scheme, a commit operation is inserted at the home block of an operation to check for a deferred exception.
One problem with the tagging approach is that the amount of speculation is typically limited by the number of opcodes available for tags. When more bits are needed to encode the tags, fewer bits are available to enhance the repertoire of operations in the instruction set architecture. Another problem is the need to explicitly clear the information stored in the tag when the branch direction skips the commit operation.
Accordingly, it is desired to provide a system and method for deferring exceptions generated during speculative execution that overcomes the shortcomings of the prior art.