Field
This disclosure relates generally to predicated execution and, more specifically to techniques for predicated execution in an out-of-order processor.
Related Art
Today, branch instructions are considered a major impediment to exploiting instruction level parallelism (ILP), which is a measure of how many operations in a computer program (program) can be performed simultaneously. In general, compilers and hardware are required to make frequent accurate branch predictions in order to achieve adequate ILP. Branch misprediction typically results in performance degradation, due to wasted cycles that are introduced into an instruction stream. Branch misprediction in superscalar and very long instruction word (VLIW) processors degrades performance even more than branch misprediction in scalar processors in that each wasted cycle may reduce throughput by multiple instructions.
Predicated execution, which refers to conditional execution of an instruction based on a value of a boolean source operand (known as a predicate of the instruction), provides a technique to eliminate branches from an instruction stream. In a typical implementation, a compiler employing predicated execution uses an if-conversion algorithm to convert conditional branches into predicate defining instructions and instruction streams along alternate paths into predicated instructions. In a typical case, predicated instructions are fetched irrespective of their predicate value. An instruction with a true predicate is executed normally, while an instruction with a false predicate is nullified such that the nullified instruction is prevented from modifying a processor state. In general, predicated execution allows a compiler to provide ILP to the hardware along multiple execution paths, albeit at lower instruction fetch efficiency.
Out-of-order execution allows instructions to execute in any order that does not violate data dependencies. Out-of-order execution may or may not be employed in conjunction with pipelining and superscalar techniques. Out-of-order execution is employed in many high-performance processors to utilize processor cycles that would otherwise not be utilized. The primary focus of out-of-order execution is to allow a processor to avoid stalls that occur when data needed to perform an operation is not available. In an out-of-order processor, instructions are handled in data order, i.e., in the order in which data operands become available in a register of the processor. Out-of-order processors fill slots in time when data is not available for associated instructions with other instructions that are ready to execute. An out-of-order processor then re-orders results, before committing instructions executed out-of-order to architectural state, such that it appears that the instructions were processed in program order, i.e., the order of the instructions in an original program.
To successfully reorder instructions, modern out-of-order processors employ a technique called renaming. Renaming involves assigning unique physical locations to each result register generated by an instruction. Implementations typically use reservation stations, register update units (RUUs), physical register files, etc. An out-of-order processor tracks the availability of a result operand of a producing instruction and specifies (to a dependent consumer instruction) a unique physical location from which to obtain the result operand. Predication may introduce multiple producer instructions to the same architectural register, each of which gets renamed to a unique physical location in an out-of-order processor. As a result, an out-of-order processor may not be able to uniquely identify (to a waiting consumer instruction) an exact location of a source operand. Consequently, in the absence of explicit support to handle such situations, an out-of-order processor has stalled until all in-flight producer instructions have completed, which results in inefficient performance. Processors that employ predicate prediction have been disclosed in a number of U.S. patents (see, for example, U.S. Pat. Nos. 7,085,919; 6,513,109; and 6,442,679).