Many modern microprocessors use pipelining for performance purpose, for example, the Very-Long Instruction Word (VLIW) processors. Conditional execution has long been an important aspect of the pipeline execution. For those processors supporting conditional execution, their instruction set allows commands or selected commands marked as conditional mode. For example, the following statement assigns the value r4*r5 to the variable r3 only when r1 is greater than r2:if (r1>r2)r3=r4*r5;This statement may be complied into a conditional instruction supported by a hypothetical processor, as follows:cmpgt p0=r1,r2 [p0] mpy r3=r4*r5 where [p0] mpy is a conditional instruction, that is, an instruction that is marked as conditional mode. The multiplication and assignment will be executed only when the condition [p0] is true. The two major benefits of using conditional execution are that it will reduce the pipeline stall caused by branch instruction, and it allows dual-path execution to implement the if-then-else statement in a highly-parallel hardware environment, such as VLIW, which leads to performance improvement.
Numerous research reports have been disclosed and many products are marketed since Cray-I was first introduced in 1978 to allow marking some instructions conditional mode.
U.S. Pat. No. 6,016,543 disclosed a microprocessor for controlling the conditional execution of instructions, which uses hardware to detect the data dependency among the instructions and insert hardware interlock to ensure the correctness of the conditional execution instructions. The disclosed technology is applicable to a superscalar processor. FIG. 1 is a timing chart showing a pipeline interlock cancel process for multiplication data executed in the disclosed microprocessor having conditional execution instructions. As shown in FIG. 1, the first ldw is a conditional instruction, and the subsequent mul2h depends on the result of the conditional instruction ldw. During the execution, the mul2h instruction is stalled at the decode stage to wait for the result of ldw condition. When the result of the condition is true, the multi2h instruction is executed, shown as the upper part of FIG. 1. On the other hand, when the condition is false, the ldw instruction is cancelled and the mul2h instruction directly enters the execution stage without waiting for the results of the ldw.
U.S. Pat. No. 6,513,109 disclosed a method for implementing execution predicates in a computer processing system, which applies the speculative execution of branch instruction to the conditional execution. FIG. 2 is a diagram illustrating a system for performing out-of-order superscalar execution with predicate prediction according to the disclosed method. As shown in FIG. 2, when encountering a conditional instruction, the predictor predicts the result of the condition. Based on the prediction, the corresponding instructions are issued to the functional unit for execution. After execution, the results of the execution are stored in a future register file to be used by subsequent instructions. The instruction will stay in the in-order retirement queue for the confirmation of the result of the condition. All the instructions will sequentially confirm the result and write-back to the architecture register file. If a prediction error is found during the confirmation, all the subsequent instructions in the retirement queue will be cleared and the execution re-starts from the point where the prediction error occurs.
U.S. Pat. No. 6,374,346 disclosed yet another method applicable to a VLIW digital signal processing (DSP) processor. Unlike conventional processors where only some instructions can be assigned for conditional execution and the conditions are from a small number of flag registers set by special instructions, the disclosed method allows any instruction in the instruction set to be assigned for conditional execution and the conditions are from the general purpose registers, which can be set by any instruction, not limited to comparison instructions. FIG. 3 shows a schematic view of the pipeline behavior of an embodiment of this design.
As the conditional execution has great impact on the pipeline behavior, it is necessary to take the pipeline behavior into account when developing conditional execution mechanisms. There are three major pipeline behaviors when conditions are involved. The conditions can be interpreted and executed at the decode stage, the first execution stage or the last execution stage.
FIG. 4 shows a conventional pipeline behavior in which the conditions are interpreted and executed at the decode stage. In this approach, the condition of a conditional instruction is treated as an operand of an instruction, and is read in at the decode stage of the instruction. The instruction is then determined, based on the condition, whether to enter the execution stage or turn to NOP for no execution. This mode is also called the conditional issue mode. This design is direct and simple, thus low cost. Many commercial processors, including ARM, SUN SPARC, and Intel Itanium, all use this mode. The advantage of this mode is the early determination of NOP. When the functional units include energy-saving design, such as clock-gating when encountering NOP, energy consumption can be reduced. On the other hand, it has the disadvantage of longer execution latency. As shown in FIG. 4, even with forwarding path, there is a one-cycle delay between the instruction computing the condition and the instruction using the condition.
FIG. 3 shows a conventional pipeline behavior in which the conditions are interpreted at the first execution stage to reduce the pipeline stall. As shown in FIG. 3, the condition is interpreted at the first execution stage and then transmitted through the forwarding path. This design forms a smooth pipeline and requires no stall between the instruction computing the condition and the instruction using the condition. This design is used in the TI 320C6xxxx series DSP processors.
FIG. 5 shows a conventional pipeline behavior in which the conditions are interpreted at the last execution stage. The delay of the interpretation of the conditions to a later stage enables the condition interpretation and data computation to be executed in parallel. A shown in FIG. 5, the conditional instruction interprets the condition through the forwarding path at the last execution stage, and determines whether the execution results should be written-back.
FIG. 6A and FIG. 6B show the timing of an if-then-else statement execution using a conventional branch/conditional issue mode and using ASIC architecture, respectively. The if-then-else statement is as follows:if (u.v)x=a*b+c*d; elsex=a*b−c*d; y=x*f; 
As seen in FIG. 6A, the determination of the condition must be executed before the data-computing for the conventional branch/conditional issue mode. While using ASIC as seen in FIG. 6B, the determination of the condition can be executed in parallel with the data-computing through the use of a multiplexer for selecting the comparison result for the condition. Therefore, the conditional execution has a great impact on the pipeline behavior and the eventual performance of the VLIW processor. The power consumption of many discarded computations may pose as an important constraint in the VLIW architecture design. It is imperative to provide a design that is flexible in both saving energy consumption as well as achieving near-ASIC performance.