Computer programs consist of a set of instructions intended to be executed on a computer system to perform some useful task. Typically, programs are designed to execute certain instructions conditionally, i.e. if one or more conditions are satisfied then the conditional instructions will be executed otherwise they will not be executed. In this context, "executed" means that an instruction performs a specified operation which will result in a modification of the state of the computer system and/or result in a particular sequence of events within the computer system. In traditional computer systems conditional execution is implemented via the branch or jump instruction well known in the art. "Predicated execution" or "predication" (sometimes referred to as "conditional execution" or "guarded execution") is a technique whereby instructions can be executed conditionally without the need for a branch instruction.
Predicated execution is implemented by associating a "predicate" with an instruction where the predicate controls whether or not that instruction is executed. If the predicate evaluates to "true", the instruction is executed; if the predicate evaluates to "false", the instruction is not executed. The definition of "true" and "false" may vary with each embodiment. The function by which the predicate is determined to be true or false may also vary with each embodiment. For example, some embodiments may define the predicate to be a single bit where a value of one is true and a value of zero is false while alternate embodiments may define the predicate to be multiple bits with a specific function for interpreting these bits to be true or false.
By conditionally executing instructions under the control of a predicate, predication eliminates branch instructions from the computer program. This is beneficial on wide and deep pipelines where the flushes due to branch mispredictions causes several "bubbles" in the execution pipeline, giving rise to a large loss of instruction execution opportunities. Predication improves performance by eliminating branches, and thus any associated branch mispredictions. Since branch instructions typically cause breaks in the instruction fetch mechanism, predication also improves performance by increasing the number of instructions between branches thus increasing the effective instruction fetch bandwidth.
Predicates are typically stored in a dedicated "predicate register set". The exact form of the predicate register set may vary with each embodiment. For example, some embodiments may define a plurality of registers each containing a single predicate while alternate embodiments may define the predicates to be one or more bits in a "condition code" or "flags" register. The exact number of predicates may also vary with embodiment. For example, one embodiment may define 64 predicates while another may define only 8 predicates.
There are typically two methods employed to access predicates: individual and "broadside". Predicates are typically written individually by compare instructions and read individually by any predicated instruction. Broadside access refers to reading or writing all predicates simultaneously in a single access. Predicates are typically read and written in broadside fashion for procedure entry and exit and for context switching (a "context switch" occurs when execution on the presently active "process" or "task" is stopped and another process is selected for execution). In broadside access the contents of the predicate register set are typically saved/restored to/from another register in the processor or to/from a memory location in the computer system
Typically the predicates in the predicate register set are equally accessible to all procedures (also known in the art as "functions" or "subroutines") in a computer program. This necessitates the specification of rules by which the sharing can occur so that one procedure does not overwrite the predicates of another procedure. For this purpose registers are divided into two classes: "scratch" and "preserved". By software convention, the contents of scratch registers are lost at the point of a procedure call; the contents of preserved registers are maintained across a procedure call. Note that the division of registers into scratch and preserved classes is a convention used by software and is typically not enforced by the instruction set architecture. This division necessitates certain actions on the part of each procedure. For example, assume procedure A (the "caller") is calling procedure B (the "callee"), therefore a all instruction to procedure B will appear within procedure A and a return instruction to procedure A will appear at the end of procedure B. If procedure A needs the contents of any scratch register after the call to procedure B, then it must save their contents before the instruction that calls procedure B and must restore their contents after said instruction. If procedure B needs to use any preserved register then it must save their contents before using them and must restore their contents before returning to procedure A.
In the situation where procedure B does use preserved predicate registers, when procedure B restores the preserved predicate registers all predicates will be overwritten since, as discussed above, predicates are accessed in broadside fashion for procedure entry and exit. If any instructions following this broadside restore need to read the predicate register set (e.g. any predicated instruction), it would be necessary to insert additional instructions to re-write the predicate register set after the broadside restore and before said predicated instructions. These additional instructions reduce the performance of the procedure by increasing the number of sequentially dependent instructions. In addition, due to the use of pipelining in modern processors, performance may be further degraded due to pipeline stalls caused by instructions that read the predicates waiting on a previous broadside predicate restore to complete.
Therefore, there is a need for a method and apparatus that overcome the disadvantages of the prior art by restoring selected predicate registers of a predicate register set in response to a single instruction.