In the field of computer processor design, developers are always looking for ways to increase the rate at which the processor executes instructions. To accomplish this goal, the processor can be designed to execute several operations at once, or the cycle time of the processor can be reduced. One type of processor, referred to as a superscalar processor, includes special hardware to identify operations in the instruction stream that can be executed simultaneously. The superscalar processor improves performance by executing operations in the instruction stream simultaneously.
Another type of processor, referred to as superparallel or Very Long Instruction Word (VLIW), relies on the compiler to schedule operations in bundles that can be executed in parallel. Since the hardware is simpler than in superscalar processors, the cycle time can be reduced further.
One problem with processors that can execute more than one operation in parallel is that there often are not enough independent operations to keep the hardware resources busy. The phrase commonly used to refer to the extent to which operations can be executed in parallel is "Instruction Level Parallelism." Programs executed on VLIW processors are typically optimized to improve instruction level parallelism. This optimization can be performed in the compiler, in the hardware, by hand, or using some combination of these techniques.
Speculative code motion is a form of optimization that can improve instruction level parallelism. In general, it involves moving an operation across a conditional branch that controls its execution. In speculative code motion, one or more operations are moved from their home basic block to a previous basic block in the program. A "basic block" is a straight line sequence of operations followed by a branch. The home block is the basic block in which the speculative operation originally resides in the program. The previous basic blocks for a given basic block include all the basic blocks that can branch to the given basic block or that sequentially precede the basic block.
An operation moved in this manner is referred to as "speculative" or "anticipatory" because it is executed before it is known whether the operation will be used in the program. The result of a speculative operation may never be used because a conditional branch that leads to the home block of the operation may take a different path.
While speculative code motion can improve the performance of VLIW and superscalar processors, a problem can arise when a speculative operation generates a fault. Consider, for example the following source code: EQU if(A|=0) B=*A
A non-speculative version of this code would be:
. . (some code here)
branch to instruction X if register A holds a 0
load register B from the address in register A
X: . . .
The speculative version of this code would be:
load register C speculatively from the address in register A
. . (some code here)
branch to instruction X if register A holds a 0
copy the contents of register C to register B
X: . . .
In this example, the speculative code motion improves the instruction level parallelism, and has the additional benefit of reducing the impact of the latency incurred in the load operation. However, a speculative operation may generate a fault even if the result of the operation is never used in the program. For instance in this example, the speculative load operation may generate a fault when register A holds a zero. If a speculative operation generates a fault, it should not be reported or processed immediately. Instead, processing of the fault should be deferred until it is known that the result of the operation will actually be used in the program. This point is sometimes referred to as the commit point, the point where we know that the result of the operation will be needed.
There are a number of possible approaches to deal with exceptions generated during speculative execution. One conservative approach is referred to as "safe speculation." In this approach, only operations that do not generate exceptions are moved speculatively. This approach does not improve instruction level parallelism sufficiently because it precludes speculative motion of many operations. Moreover, it does not allow load operations to be executed speculatively, and therefore, does not have the benefit of hiding memory latency.
Another alternative approach is referred to as boosting. In this approach, a speculative operation is tagged with the path back to its home basic block. To defer an exception, this state information must be saved until the processor takes a different execution path or it uses the result of the operation in a non-speculative operation.
The need to save this state information is a drawback of the boosting technique. Additional memory is required to store this state information. This gives rise to a trade off between the extent to which boosting can be achieved and the additional memory required to store the state information. The number of branches that an operation can be moved across is limited by the memory available to store the state information.
Another approach involves the use of a poison bit to defer exceptions. In this approach, the processor marks the result register of a speculative operation with a poison bit when an exception has been generated. When another speculative operation uses the result of this operation, the processor propagates the exception by setting a poison bit in the result register of the operation. Processing of the exception is deferred until a non-speculative operation consumes the poison bit. At that point, the processor reports or processes the exception.
Yet another approach is referred to as tagging. In this approach, each operation has a tag associated with it. Typically, a tag of zero indicates that the operation is non-speculative. For speculative operations, the tag refers to memory in the processor such as a tag table that stores information about deferred exceptions. In this scheme, a commit operation is inserted at the home block of an operation to check for a deferred exception.
While these approaches of deferring exceptions improve ILP by increasing the number of operations that can be executed speculatively, the processor needs a method for processing deferred exceptions when they are detected. The process of handling a deferred exception includes re-executing one or more of the speculative operations in a non-speculative manner. This process is generally referred to as "recovery," while the process of re-executing operations in the recovery mode is referred to as "re-execution."
One way of performing recovery is to branch to block of fix-up code when a deferred exception is detected. Fix-up code is a block of code added to the program by the compiler to handle an exception from a speculative operation. In this approach, the compiler is responsible for adding a block of fix-up code for every chain of speculative operations in the program. The fix-up code includes each of the operations in the speculative chain, but they are in non-speculative form so that any exceptions generated while the processor recovers from the exception are handled immediately. When the processor detects an exception, it branches to the fix-up code, executes the fix-up code, and then resumes processing at the point where it detected the exception.
If a program is scheduled with several speculative operations or chains of operations, then a great deal of fix-up code has to be generated. As a result, the fix-up code can cause the size of the program to balloon. The need to generate fix-up code also complicates the compiler design. It is possible to have several operations interspersed within a sequence of code that should not be re-executed. For example, if an operation is not dependent on a speculative operation or any consumer of the results of the speculative operation, it should not be re-executed in the event that the speculative operation generates an exception. Because of operations like this, the compiler has to ensure that the fix-up code only includes the operations necessary to recover from the exception. As such, the compiler design is complicated by the need to compute the fix-up code for each speculative chain, whether or not the results of the speculative chain are actually used by non-speculative operations during execution of the program.