The present invention relates to a VLIW instruction including a plurality of compare instructions, and a technique for executing the VLIW instruction.
Processors (VLIW processors) that use VLIW (Very Long Instruction Word) instructions are capable of executing a plurality of instructions in one cycle, and are therefore used in various fields.
In image processing, for example, a complex conditional judgment is required for processing, such as edge detection, in which values of neighboring pixels are compared to determine the value of a pixel of interest. Expression (1) shows a processing example of an edge detection filter.res=(val>c&&(val>b∥val==b&&(sc==0∥sc==2)))?1:0;  (1)
In the term preceding the mark “?” in Expression (1), each alphabet represents an argument and each value represents an immediate. In addition, “&&” and “∥” represent “AND” and “OR”, respectively, and each of an equal sign and an inequality sign represents comparison processing. Assuming that the term preceding the mark “?” is X on the right-hand side of Expression (1), Expression (1) represents processing in which “1” is output as the value “res” when X is true, and represents processing in which “0” is output as the value “res” when X is false.
FIG. 18 illustrates an example of a program for use in implementing the processing shown in Expression (1) in a VLIW processor by using branch instructions.
As shown in the program of FIG. 19, for example, the processing shown in Expression (2) below can also be implemented in the VLIW processor by using branch instructions.res=((d01==0)&&(d11!=0))&&(((d02!=0)&&(d12==0))∥((d00!=0)&&(d10==0)))  (2)
For example, as shown in the second to fourth lines of the program of FIG. 18 and the second to third lines of the program of FIG. 19, one branch instruction (brf) and one compare instruction (cmp) are executed in parallel. This enhances the processing efficiency as compared with processors that can execute only one instruction in one cycle.
In general, however, there is a problem that it takes a lot of time to execute the branch instruction. For example, if a branch penalty is two cycles, nine cycles are required to execute the program illustrated in FIGS. 18 and 12 cycles are required to execute the program illustrated in FIG. 19.
Techniques for the VLIW processor have been proposed from various perspectives.
For example, Japanese Unexamined Patent Application Publication No. 10-27102 discloses a technique for eliminating conditional branching by using predicate registers.
The VLIW processor to which this technique is applied includes a plurality of operation units that are provided corresponding to one or more of a plurality of operation instruction fields included in a single VLIW instruction. Each operation unit includes an operation circuit that performs an operation indicated by corresponding one or more operation instruction fields; a register (predicate register) that stores a value for determining whether or not to execute the operation of the operation circuit; and storage means for writing, into the registers within all the operation units, all values obtained by evaluating the operation result of a predetermined instruction, in response to the predetermined instruction. The operation circuit within each operation unit determines whether or not to execute the operation instruction designated in the predicate register described above, according to the value written in the predicate register.
Japanese Unexamined Patent Application Publication No. 07-302199 discloses a technique in which general-purpose sum-of-products circuits are provided in parallel in a VLIW processor and a complex conditional judgment (complex test) is carried out in one cycle to thereby achieve conditional branching.
Japanese Unexamined Patent Application Publication No. 2008-146544 discloses a technique for combining a plurality of condition codes, which are obtained through operations in a plurality of cycles, into a single condition code set.
Published Japanese Translation of PCT International Publication for Patent Application, No. 2003-520360 discloses a technique for obtaining results of a Boolean combination of state information generated from a current compare instruction and a compare instruction in a previous cycle.