Conditional execution, also referred to as predicated execution, provides the programmer the ability to specify for a non-branch type of instruction whether it is to execute or not based upon a machine state generated previously. This data-dependent conditional execution capability minimizes the need for conditional branches. By avoiding the use of branches, which incur a branch delay penalty on pipelined processors, performance is improved. In addition, it is noted that many types of sequential control dependencies can be turned into parallel data dependencies. Consequently, it is desirable that a pipelined SIMD array processor support conditional execution in each processing element (PE) to provide a level of data-dependent parallelism unavailable on a Single Instruction Multiple Data stream (SIMD) machine that only supports conditional branching. With parallel conditional execution, the performance gain can be significant since multiple conditional branches can be avoided.
In creating the architecture of a parallel array indirect VLIW processor for a given range of operations it is found that the format needed to specify the operations varies in requirements depending upon the type of operation. For example, the parallel array operations can be grouped into three types, control and branch operations, load and store operations, and arithmetic operations. Each of these types will have different encoding requirements for optimum implementation. Since the instruction format typically is of a fixed number of bits, it is difficult, without restricting functional capabilities for at least some of the operations, to define a mechanism supporting a single specification for conditional execution across all instructions in a processor. Given that it is desirable to support conditional execution, even if the degree of support must vary depending upon the instruction type, a problem is encountered on how to define a unified but variable-specification conditional execution mechanism based upon the instruction type.
For conditional branching or conditional execution to be more efficient, it is desirable that the conditional operation be based on complex conditions that are formed by a Boolean combination of relations such as [a>b OR c<d]. This may be accomplished by sequentially using multiple single-test conditional branches that effectively achieve the desired result. The problem associated with using multiple single-test conditional branches is that there is a performance decreasing effect for each branch required due to the branch delay penalty. This performance decreasing effect can be reduced with non-branching complex conditional execution.
In machines with a SIMD architecture, it is desirable to generate independent conditional operations in the PEs as well as to transfer condition information between PEs to allow the gathering of conditional state information generated in the PEs. It is also desirable to provide conditional branching in the controller, sequence processor (SP), of a SIMD array processor where the conditions are created in the array PEs. By allowing condition-state information to be moved between PEs, a condition producing operation can take place in one PE and a conditional operation based upon the conditional result to take place in another PE. By allowing conditional information to be moved between the PEs and the SP, a conditional operation can take place in the SP based upon PE conditions. How to best add such capability into the architecture raises further issues.
In VLIW machines, a plurality of execution units exist that may execute in parallel, with each execution unit possibly producing condition information or state information for each sub-instruction of the multi-instruction VLIW. To make a data dependent conditional execution decision, it is necessary to reduce the total amount of machine state to the desired test condition. It is also desirable to have a mechanism to select condition results from one of the multiple execution units to control the execution of one or more of the other execution units. An example of this type of situation is a compare instruction followed by a conditionally dependent shift instruction where the compare is performed in a different execution unit than the shift. Consequently, the problems to be solved are how to reduce the amount of condition information to a specified test condition and how to provide a mechanism for interdependent conditional execution between the multiple execution units that operate in synchronism in a VLIW machine.
Sub-word execution refers to the multiple individual operations that simultaneously take place on pieces of data smaller than a word or double word within a single execution unit. The aggregate of the multiple sub-word operations are referred to as packed data operations, where for example quad 16-bit operations or octal 8-bit operations occur in parallel on packed 64-bit data types. When performing sub-word execution in a machine that supports conditional execution, it is desirable to achieve a sub-word level of conditional execution granularity when executing the instruction. The question is how to support such a capability in the architecture.