Many modern processors employ a technique called pipelining to execute more software program instructions (instructions) per unit of time. In general, processor execution of an instruction involves fetching the instruction (e.g., from a memory system), decoding the instruction, obtaining needed operands, using the operands to perform an operation specified by the instruction, and saving a result. In a pipelined processor, the various steps of instruction execution are performed by independent units called pipeline stages. In the pipeline stages, corresponding steps of instruction execution are performed on different instructions independently, and intermediate results are passed to successive stages. By permitting the processor to overlap the executions of multiple instructions, pipelining allows the processor to execute more instructions per unit of time.
In practice, instructions are often interdependent, and these dependencies often result in “pipeline hazards.” Pipeline hazards result in stalls that prevent instructions from continually entering a pipeline at a maximum possible rate. The resulting delays in pipeline flow are commonly called “bubbles.” The detection and avoidance of hazards presents a formidable challenge to designers of pipeline processors, and hardware solutions can be considerably complex.
There are three general types of pipeline hazards: structural hazards, data hazards, and control hazards. A structural hazard occurs when instructions in a pipeline require the same hardware resource at the same time (e.g., access to a memory unit or a register file, use of a bus, etc.). In this situation, execution of one of the instructions must be delayed while the other instruction uses the resource.
A “data dependency” is said to exist between two instructions when one of the instructions requires a value produced by the other. A data hazard occurs in a pipeline when a first instruction in the pipeline requires a value produced by a second instruction in the pipeline, and the value is not yet available. In this situation, the pipeline is typically stalled until the operation specified by the second instruction is carried out and the result is produced.
In general, a “scalar” processor issues instructions for execution one at a time, and a “superscalar” processor is capable of issuing multiple instructions for execution at the same time. A pipelined scalar processor concurrently executes multiple instructions in different pipeline stages; the executions of the multiple instructions are overlapped as described above. A pipelined superscalar processor, on the other hand, concurrently executes multiple instructions in different pipeline stages, and is also capable of concurrently executing multiple instructions in the same pipeline stage. Pipeline hazards typically have greater negative impacts on performances of pipelined superscalar processors than on performances of pipelined scalar processors. Examples of pipelined superscalar processors include the popular Intel® Pentium® processors (Intel Corporation, Santa Clara, Calif.) and IBM® PowerPC® processors (IBM Corporation, White Plains, N.Y.).
Conditional branch/jump instructions are commonly used in software programs (i.e., code) to effectuate changes in control flow. A change in control flow is necessary to execute one or more instructions dependent on a condition. Typical conditional branch/jump instructions include “branch if equal,” “jump if not equal,” “branch if greater than,” etc.
A “control dependency” is said to exist between a non-branch/jump instruction and one or more preceding branch/jump instructions that determine whether the non-branch/jump instruction is executed. A control hazard occurs in a pipeline when a next instruction to be executed is unknown, typically as a result of a conditional branch/jump instruction. When a conditional branch/jump instruction occurs, the correct one of multiple possible execution paths cannot be known with certainty until the condition is evaluated. Any incorrect prediction typically results in the need to purge partially processed instructions along an incorrect path from a pipeline, and refill the pipeline with instructions along the correct path.
A software technique called “predication” provides an alternate method for conditionally executing instructions. Predication may be advantageously used to eliminate branch instructions from code, effectively converting control dependencies to data dependencies. If the resulting data dependencies are less constraining than the control dependencies that would otherwise exist, instruction execution performance of a pipelined processor may be substantially improved.
In predicated execution, the results of one or more instructions are qualified dependent upon a value of a preceding predicate. The predicate typically has a value of “true” (e.g., binary “1”) or “false” (e.g., binary “0”). If the qualifying predicate is true, the results of the one or more subsequent instructions are saved (i.e., used to update a state of the processor). On the other hand, if the qualifying predicate is false, the results of the one or more instructions are not saved (i.e., are discarded).
In some known processors, values of qualifying predicates are stored in dedicated predicate registers. In some of these processors, different predicate registers may be assigned (e.g., by a compiler) to instructions along each of multiple possible execution paths. Predicated execution may involve executing instructions along all possible execution paths of a conditional branch/jump instruction, and saving the results of only those instructions along the correct execution path. For example, assume a conditional branch/jump instruction has two possible execution paths. A first predicate register may be assigned to instructions along one of the two possible execution paths, and a second predicate register may be assigned to instructions along the second execution path. The processor attempts to execute instructions along both paths in parallel. When the processor determines the values of the predicate registers, results of instructions along the correct execution path are saved, and the results of instructions along the incorrect execution path are discarded.
The above method of predicated execution involves associating instructions with predicate registers (i.e., “tagging” instructions along the possible execution paths with an associated predicate register). This tagging is typically performed by a compiler, and requires space (e.g., fields) in instruction formats to specify associated predicate registers. This presents a problem in reduced instruction set computer (RISC) processors typified by fixed-length and densely-packed instruction formats.
Another example of conditional execution involves the TMS320C6x processor family (Texas Instruments Inc., Dallas, Tex.). In the ‘C6x processor family, all instructions are conditional. Multiple bits of a field in each instruction are allocated for specifying a condition. If no condition is specified, the instruction is executed. If an instruction specifies a condition, and the condition is true, the instruction is executed. On the other hand, if the specified condition is false, the instruction is not executed. This form of conditional execution also presents a problem in RISC processors in that multiple bits are allocated in fixed-length and densely-packed instruction formats.