1. Field of the Invention
The present invention relates generally to a compiler device for a processor having conditional executable instructions, and particularly to a compiler device that executes software pipelining utilizing conditional executable instructions.
2. Related Background Art
Recently, processors capable of parallel processing so as to implement higher-speed processing have increased, and some include, in their instruction specification, conditional executable instructions that are executed when conditions referred to are met, as one of the techniques for improving the effects of the parallel processing. It is advantageous to utilize such instructions since penalties generated when branch instructions are executed, as well as the branch instructions per se, can be omitted.
For instance, when a C language program shown in FIG. 28A is compiled without employing any conditional executable instructions, an assembler code as shown in FIG. 28B is obtained. With an instruction 101, whether the condition (R5≦0) is met is determined, and if the condition is met, the instruction 102 causes branching to be executed. With a latency for each instruction being one cycle, in the case where the else part (a branch thereto is established at the instruction 102) is executed, the instructions 101, 102, and 105, each of which requires one cycle, are executed, and therefore three cycles are needed in this case. In the case where the then part also is executed, three cycles are needed likewise, since instructions 103 and 104 are executed in parallel. Furthermore, if penalties occur due to branch prediction failures, more cycles are needed.
On the other hand, in the case where the foregoing program is compiled employing conditional executable instructions, an assembler code as shown in FIG. 28(c) is obtained. An instruction 106 causes the result of the condition determination to be written in a flag C0 or C1. In the present case, if R5>0, C0=1 and C1=0, whereas if R5≦0, C0=0 and C1=1. Instructions 107 and 108 are set so that each is executed when a flag referred to is 1, whereas it is not executed when the flag is not 1. For instance, as to the instruction 107, this “add” instruction is executed exclusively when the flag C0 is 1.
In the foregoing example, as described above, 1 is written in only one of C0 and C1 in the execution of the instruction 106, and hence, the instructions 107 and 108 can be processed in parallel. Therefore, irrespective of the execution of the then part or the else part, it requires only two cycles, and no branch instruction exists, thereby resulting in no branch prediction error penalty. Therefore, the assembler code obtained by the compiling that employs conditional executable instructions is superior in both of the performance and the code size.
Furthermore, software pipelining is available as one of techniques for loop optimization of a compiler device, which also has an advantage achieved by employing conditional executable instructions.
In the case where a loop has a branch, the software pipelining may be performed in a state in which the branch exists, by employing a technique such as the hierarchical contraction (see “Konpaira no kousei to saitekika” (“Compilers: Structure and Optimization”), Asakura-shoten, p. 374), but in many cases, conditional branch instructions preferably are used, so as to make an algorithm simpler and more effective.
However, when a program is compiled employing conditional executable instructions, if it is not balanced between the then part and the else part in terms of the number of the execute cycles, the performance tends to deteriorate in some cases as compared with the case where the conditional executable instructions are not employed.
For instance, as shown in FIG. 29, the C language program in which the then part and the else part differ significantly from each other in the number of the operations to be executed is compiled employing conditional executable instructions, an assembler code as shown in FIG. 30 is obtained. This assembler code is not balanced, either, since C0 is referred to by four conditional executable instructions, while C1 is referred to by one conditional executable instruction.
Since respective flags that instructions 202 and 203 shown in FIG. 30 refer to have exclusivity, only either one of the instructions is executed, and hence, the instructions 202 and 203 can be processed in parallel. The execution of the then part requires five execute cycles, whereas the execution of the else part also requires the same number, five, of execute cycles even though only the instructions 201 and 203 are executed.
If the C language program shown in FIG. 29 is compiled, not by employing the condition execution instructions, but by employing branch instructions, an assembler code shown in FIG. 31 is obtained. Here, assuming that no branch penalty is generated, the execution of the else part can be completed with only three cycles for the executions of the instructions 207, 208, and 214.
In other words, in the case where conditional executable instructions are used, the performance is constrained by the part requiring the greater number of execute cycles. Therefore, in the case where the program is unbalanced between the then part and the else part in terms of the number of executed instructions, the use of conditional executable instructions impairs the execution performance when the part having fewer instructions to be executed is carried out.
Likewise, in the case where software pipelining is carried out employing conditional executable instructions, a start interval indicative of the number of cycles needed for one iteration is constrained by the part requiring the greater number of execute cycles. Therefore, in the case where a program is unbalanced between the then part and the else part in terms of the number of instructions to be executed, a loop that is not subjected to software pipelining and does not employ conditional executable instructions exhibits better execution performance when a part having fewer executive instructions is executed, as compared with a loop configured otherwise.