Predication (also referred to as “guarding”) of instruction execution within a processor is typically used in the context of two distinct compiler techniques:                if-conversion is a transformation converting control flow into data flow, in which predication is used to speculatively execute non-taken branches of a control flow graph, providing fewer branches and more instruction level parallelism (ILP) opportunities; and        software pipelining is a loop transformation that creates a periodic schedule for overlapped execution of successive iterations (a pipeline), allowing the same schedule for the filling (“prologue”) or emptying (“epilogue”) of the pipeline in a technique referred to as kernel-only scheduling, providing smaller code.These techniques are used in compilers to generate code for programmable processors, as well as in high level synthesis (HLS) tools to generate non-programmable processors from high-level description languages such as very high speed integrated circuit (VHSIC) hardware-level description language (VDHL) or C.        
Instruction set architectures (ISAs) support predicated execution either fully or partially. ISAs with full predicated execution support provide a way to prevent issued instructions from modifying the architectural state. This is achieved with a specific predicate register file, a specific set of instructions to write results of comparisons to there registers, and an additional predicate operand to most instructions conditioning the commitment of the result of the destination register. The instruction format provides room for three source operands, and this implementation of full predication is called instruction predication.
ISAs with partial predicated execution support emulate predication with ordinary non-predicated instructions and a way to conditionally copy one register to another. In most implementations, a conditional move instruction (e.g., “cmove”) is provided to conditionally copy one source operand to the destination depending on the value of a second source operand (the predicate or guard condition). Another implementation uses a selection instruction (e.g., “select”) with three source operands, copying one of two source operands to the destination based on the value of the third source operand (the predicate).
The two approaches to predicated execution are closely related since full predication support effectively combines an implicit conditional move instruction with every predicated instruction. However, full predication support offers the most benefits in terms of performance (number of cycles, code size, resource usage), but instruction predication requires that predication be designed in the ISA from the ground up, essentially because room must be made for an additional source operand (the predicate) in the instruction format. Therefore existing ISAs without full predication support—typically using a three operand instruction format (one destination and two sources)—cannot be extended to support full predication.
On the other hand, in ISAs with full predication support the predicate operand field constitutes a non-negligible portion of the code size. For example, if a 32-bit ISA defines sixteen predicate registers, the predicate operand field represents as much as 12.5% of the memory footprint of a program.
Unlike full predication support, partial predication support may be readily added to existing ISAs since only the addition of at least one instruction is required. The downside is that partial predication support is not as effective as full predication support, resulting in larger code and higher resource usage as illustrated by the if-conversion code example in TABLE I:
TABLE IOriginalPredicated Low-Level CodePredicated Low-Level CodeCodeWith Full SupportWith Partial Supportz=...z=...z=...if (i<0) {p1 = (i<0);p1 = (i<0); x=0;x = 0if p1;x1 = 0; y=0;y = 0if p1;y1 = 0;} else {p2 = (i>=0);p2 = (i>=0); x=A[i];x = *(A+i)if p2;tmp1 = A+i; x=x*ix = x*iif p2;tmp1 = cmove(safe_addr,p1); y=y+xy = y+xif p2;x2 = *tmp1; z=y>>1;z = y>>1if p2;x3 = x2*i; B[i]=y;*(B+i) = yif p2;y2 = y+x3;}z2 = y2>>1;tmp2 = B+i;tmp2 = cmove(safe_addr,p1);*tmp2 = y2;x = cmove(x1,p1);x = cmove(x3,p2);y = cmove(y1,p1);y = cmove(y2,p2);z = cmove(z2,p2);
As immediately observable from TABLE I, partial predication support requires significant code expansion because of the addition of conditional move instructions and the care which must be taken to avoid illegal memory accesses.
Also apparent for TABLE I is that predication adds explicit dependence edges in the original program data flow graph. These edges go from the predicate definition (p1 and p2) to use of the predicate in the predicated instructions (for full support) or to the inserted conditional move instruction (for partial support). These additional dependence edges can negatively affect the effectiveness of the code generated by a compiler.
The negative impact of predication is particularly evident for the software pipelining of loops in the case where the target machine supports full predication but not rotating registers. In this situation, an iteration predicate is computed for each iteration, and each instruction of the loop body is guarded by the iteration predicate to enable the execution of the prologue and epilogue of the loop with the same code as the kernel. Therefore the iteration predicate is live across all of the stages of the iteration. Moreover, since no rotating registers are present, the kernel of the loop must be unrolled a number of times equal to the number of stages of the schedule, which is actually a worst case. Without the constraint imposed by this implementation of predication, the number of unrolls and the resulting code expansion can be reduced.
There is, therefore, a need in the art for an implementation of full predication realized by simple extension of an existing ISA having no built-in full predication support, to achieve the benefits of full predication within this class of ISAs. There is also a need to reduce the number of data dependencies introduced in the program data flow graph by predication so as to enable the compiler to generate more efficient code, or more efficient hardware.