The present invention relates generally to a method and apparatus for the speculative execution of instructions in a computer central processing unit (CPU). More specifically, the present invention provides for the management of exceptions caused by resequenced CPU instructions.
As CPU designs have developed over the years, CPU designers have added more functional units to CPU architectures. For instance, modern superscalar CPUs execute multiple integer, floating point, and memory operations in one cycle. CPU efficiency increases when the program being executed utilizes a higher percentage of these units at any one time. Many modern computing systems thus benefit from their ability to execute more than one instruction at a time. Very long instruction word (VLIW) and superscalar CPUs represent two of the more popular architectures. However, taking full advantage of the computing power they offer can prove difficult. To fully exploit the computing power available, these CPUs force the programmer to either hand-code routines, use routines hand-coded by others, or use advanced program compilers. The first two methods are labor intensive and expensive, and are therefore often impractical. The preferred method uses a compiler written to take advantage of the given CPU's capabilities. While many features and advantages are offered by VLIW and superscalar techniques, one feature employed by both is speculative execution of instructions (or simply "speculative execution").
Speculative execution is the term used to describe the execution of instructions prior to or during the evaluation of the branch controlling their execution, and is an important technique for enhancing instruction-level parallelism (the execution of more than one instruction at a time, also known as "ILP") in compiled software. In superscalar and VLIW CPUs it is advantageous to maximize CPU utilization by identifying instructions which may be grouped together and executed simultaneously on the CPU's various execution units. Furthermore, it is advantageous to resequence instructions whose execution depend on the results of a branching instruction to facilitate this grouping. This resequencing is known as "speculative code motion". "Speculative" refers to the fact that the results of the instructions executed may never be used and "code motion," to the moving of the instructions to a position before the branching instruction. The resequenced instructions (known as "speculated instructions") may be sequential with the branching instruction (termed the "fall-through stream") or may be the branching instruction's target (termed the "target stream").
Instruction-level parallelism, and CPU efficiency, may be increased by using idle execution units to execute these instruction sequences prior to and during the branch's evaluation. At a minimum, the instructions executed are those in the path most likely taken by the branch. This is determined by a prediction method which selects the instruction-stream most likely to be executed. This is known as "partial speculation," as only one of the possible instruction streams is speculatively executed. More desirable is a CPU with the ability to execute instructions in both the fall-through stream and target stream (known as "full speculation"). Given the overhead involved in evaluating a branch, speculative execution is gaining in popularity.
Speculative execution thus involves the execution of one or more instructions before the evaluation of the preceding branch has been completed. The CPU executes instructions in advance, using otherwise idle instruction processing units. If the branch is taken in the predicted direction, parallelism is increased by the early execution of the speculated instructions. If the branch is not taken in the predicted direction, the results of the speculated instructions are simply discarded. Compiler control of such speculative code motion is known as "static scheduling" because the execution order is determined by the compiler prior to program execution. This is in contrast to "dynamic scheduling" in which path prediction and execution order are determined by the processor during program execution (e.g., the prediction is made during runtime and the selected instruction stream is speculatively executed).
Some currently available compilers are capable of scheduling the simultaneous execution of instructions on various execution units within a CPU. When such a compiler is scheduling instructions, the scope of scheduling is limited to basic blocks (blocks of code containing no control flow instructions (branches)). As branches are a common feature throughout most software, the size of the basic blocks scheduled by compilers tend to be small. A typical basic block size is commonly about 5 instructions. Speculative execution addresses this constraint by permitting the compiler to position speculated instructions before their controlling branch and so promote larger basic block sizes, and thus greater ILP and computational efficiency. Using full speculation, this is achieved by speculative code motion from both the fall-through and target streams to a point above the controlling branch. Currently, no commercially available CPUs implement full speculation.
Speculative execution must not change a program's behavior. To be a viable alternative, an architecture supporting full speculation must properly handle exceptions caused by speculated instructions. If a speculated instruction's execution will cause an exception, the exception must be postponed until the time when that instruction would have originally executed. Of course, if the instruction would not have executed due to the direction taken by the preceding branch, the CPU may ignore the exception. This delayed exception processing is now explained in greater detail.
To support exception handling with speculative execution, an architecture must provide speculative bits associated with the CPU's general purpose registers. Each speculative bit is simply a one-bit field associated with each general purpose register. In order to clearly explain exception handling in CPUs supporting speculative execution, the terms "generating" and "signaling" (of an exception) must be understood. Generation is the detection and logging of an exception condition resulting from an instruction's execution. A generated exception causes an exception signal when it is known that the instruction would have executed in the original (non-speculative) code sequence. Exception signaling causes the CPU to handle an exception condition by invoking exception processing which may result in abnormal program termination, invoking an exception handler, or other special actions being taken.
Exception generation and signaling are simultaneous for instruction streams on which the compiler has performed no speculative code motion. No change occurs in the program's structure. In contrast, speculative code motion may cause the separation of exception generation and signaling of the exception. This separation is accomplished through the use of a place-holder instruction (referred to as a "check_exception instruction" or "'sentinel").
When an instruction is speculatively moved above its controlling branch, the compiler determines whether the instruction could cause an exception. If the speculated instruction's results (i.e., registers) are used only by that speculated instruction, a check_exception instruction will be placed in the speculated instruction's old position to signal any exceptions caused by the speculated instruction. If the results (registers) will be used by another speculated instruction, a single check_exception instruction may be used to signal exceptions caused by either speculated instruction.
This method may be applied recursively, so that only one check_exception instruction is needed to signal an exception by any group of speculated instructions which each use a given result (register). However, the subsequent use of that result (register) by other speculated instructions must propagate the exception condition.
This is accomplished through the use of a speculative bit, which propagates the exception condition from the instruction generating the exception to the corresponding check_exception instruction (which signals the exception condition). This allows an instruction to execute speculatively and generate but not signal an exception. The exception is signaled only if it is determined later (by a check_exception instruction) that the instruction would have also executed in the original program. The execution of a check_exception instruction signals the exception condition if a speculative bit of the instruction's results (i.e., the speculative bit associated with one of the registers used by the instruction) is set. Thus, the instruction stream's operation remains unaltered by the speculative code motion performed by the compiler. The instructions have already been executed, and now the CPU processes the exception generated by their execution.
The operation of a CPU supporting speculative execution in this manner is now described. Execution of a speculative instruction in such a system proceeds as follows. If all speculative bits associated with the instruction's source registers are cleared, then program execution proceeds normally as long as an instruction doesn't generate an exception. When an instruction does generate an exception, the speculative bit associated with the destination register is set. If the speculative bits of one or more source registers are set, then an exception propagation occurs, setting the speculative bit of the instruction's destination register. To report an exception, the program counter's value and any other required CPU state information from the time of the exception are recorded and propagated (i.e., made available to be acted upon when program execution reaches the point of the instruction signalling the exception).
Execution of a non-speculative instruction in such a system proceeds as follows. If all speculative bits associated with source registers are cleared, then the execution proceeds normally, and any exception generated by the instruction is immediately signaled. If the speculative bits associated with one or more source registers are set, that indicates a speculative instruction generated an exception. The exception is then signaled using the recorded state information. If multiple source registers have their speculative bit set, the exception corresponding to the first operand is reported.
One approach to implementing a CPU capable of handling exceptions caused by speculated instructions is to widen the CPU's internal bus and register file(s) by one bit. Unfortunately, this has an adverse impact on the CPU's architectural complexity and efficiency. Adding an extra bit gives each register an odd word-length (i.e., a word-length not equal to an even number of bytes), necessitating either bit-packing (use of a shortened, encoded version of the binary number) or wasted storage/bandwidth (using the next highest bit length, although most of the extra space is unused). This effect is propagated throughout the computer system's design for at least two reasons. During exception processing, the registers are saved, necessitating an external CPU bus of equal width (at least to the memory cache). Further, context switches (during which the currently executing process is swapped to disk) requires the ability to access permanent media in the odd (or larger) word-length. The error-correcting codes used by most systems today would also increase in complexity due to the longer word-lengths. Furthermore, the CPU's area is disproportionately increased due to the wider data and instruction busses.
Therefore, a mechanism is desirable which allows full speculation by properly handling exceptions caused by speculated instructions with minimal impact on CPU efficiency, complexity and area.