1. Field of the Invention
The present invention relates to the field of computer systems, and in particular, to systems and methods for processing predicated instructions.
2. Background Art
Advanced processors employ pipelining techniques to execute instructions at very high rates. A pipelined processor is organized into multiple stages of hardware, each of which performs one of the operations necessary to implement an instruction. Typically, each stage performs its operation in a single cycle of the processor""s clock, and the instruction is completed step-wise, as it moves from stage to stage down the processor pipeline. Fetching a new instruction into the pipeline on each clock cycle keeps the pipeline full and allows the processor to complete an instruction on each clock cycle. Superscalar processors have multiple execution pipelines that allow multiple instructions to complete on each clock cycle.
An exemplary processor pipeline includes a front-end that prepares an instruction for execution and a back-end that completes execution of the instruction. The front-end fetches and decodes the instruction(s) in successive pipe stages. The back-end maps the instruction(s) to the processor""s physical registers, retrieves operands from these registers, processes the operands according to the instruction(s), checks for execution problems (exceptions), and updates the processor""s state with instruction results, in successive pipe stages.
Frequently, the execution resources available in pipelined, superscalar processors are not fully utilized because of the limited availability of instructions that may be executed parallel. For example, dependencies between instructions limit the instruction level parallelism (ILP) available in computer code. An instruction that depends on data generated by another instruction can not be executed in parallel, i.e. simultaneously, with the other instruction. Much effort has been invested in identifying ways to expose and exploit ILP in computer code.
Software pipelining is one technique for exposing ILP in computer code that contains loops. A loop is sequence of instructions (xe2x80x9cloop bodyxe2x80x9d) that is executed repeatedly (xe2x80x9citeratedxe2x80x9d) until a termination condition is met. As long as the termination condition is not met, the processor branches back to the beginning of the loop body. If the termination condition is met, the processor exits the loop and proceeds with the instructions that follow it. A loop is typically controlled through a branch instruction, which returns the processor to the beginning of the loop body or directs it to subsequent instructions according to the termination condition.
A loop is software pipelined by organizing the instructions of the loop body into stages of one or more instructions each. These stages form a software pipeline analogous to the processor""s instruction execution pipeline. The software pipeline has a pipeline depth equal to the number of stages (the xe2x80x9cstage countxe2x80x9d or xe2x80x9cSCxe2x80x9d) of the loop body. The instructions for a given loop iteration enter the software pipeline stage by stage, on successive initiation intervals (II), and new loop iterations begin on successive initiation intervals until all iterations of the loop have been started. Each loop iteration is thus processed in stages though the software pipeline in much the same way that an instruction is processed in stages through the processor""s instruction execution pipeline. When the software pipeline is full, stages from SC sequential loop iterations are in process concurrently, and loop iterations begin and complete on every initiation interval. Various methods for software pipelining loops are discussed, for example, in B. R. Rau, M. S. Schlansker, P. P. Tirumalai, Code Generation Schema for Modulo Scheduled Loops IEEE MICRO Conference 1992 (Portland, Oreg.).
During a prolog phase, successive stages of the software pipeline are filled by xe2x80x9cactivatingxe2x80x9d the corresponding instructions on successive clock cycles. In a kernel phase, the software pipeline is full, so that one iteration of the loop is begun and one iteration is completed on each clock cycle. In an epilog phase, the software pipeline is emptied as the last iteration of the loop completes.
One technique for implementing software pipelined loops uses predication to gate the different stages of the software pipeline on or off. Predication allows instructions to be executed conditionally, by associating a predicate with one or more instructions, e.g. a software pipeline stage. The instruction(s) move down the processor""s execution pipeline while the logic state of so the predicate is evaluated. If the predicate is in a first logic state, the associated instruction(s) completes normally and updates the processor""s state. If the predicate is in a second logic state, the associated instruction is treated as a Non-Operation (NOP), i.e. it is ignored. For software pipelined loops, a xe2x80x9cstage predicatexe2x80x9d gates the instructions of a stage on or off, according to the phase of the software-pipelined loop.
One complication created by software-pipelining loops is that predicate information may be required relatively early in a processor""s execution pipeline to maintain an uninterrupted flow of instructions. For example, predicates may be used to determine whether an instruction hazard should be addressed or to route data to an instruction. Delaying these operations until the predicate is actually resolved in the back end of the processor pipeline can offset the advantages gained by software pipelining.
The present invention addresses these and other problems associated with handling predicated instructions.
The present invention provides mechanisms for predicting predicates and for managing operations associated with instructions gated by the predicates.
In accordance with the present invention, a predicate predictor maintains speculative loop status information. The predicate predictor uses the speculative loop status information and instruction information to predict when a predicate will be written by an associated instruction, and a value to be written for the predicate.
For one embodiment of the invention, the predicate is predicted to be written when a modulo-scheduled loop branch instruction is detected, and the value predicted for the predicate is determined from the speculative loop status information and branch prediction information, as necessary. The predicted predicate value controls hazard handling and data-routing operations that arise before the predicate is actually resolved. Results generated using the speculative information may be validated once the branch instruction is executed.