The present invention relates to processors, compilers and compilation methods, and in particular to technology for improving performance by using computing units efficiently in parallel processing.
In recent years, higher functionality and higher speeds of products with microprocessors have brought about a need for microprocessors (referred to simply as “processors” in the following) having a high processing performance. In general, in order to increase the throughput of instructions, the pipeline approach is adopted, in which one instruction is broken down into several processing units (here referred to as “stages”), and a plurality of instructions are processed in parallel by executing each stage with separate pieces of hardware. In addition to spatially parallel processing as with the pipeline approach, higher performance is achieved by the VLIW (very long instruction word) approach or the superscalar approach in which temporal parallel processing is performed at the instruction level.
One major factor obstructing performance increases in processors is the overhead for branching processes. With this overhead, the penalty for instruction supply is larger, the more stages there are in the pipeline process. Furthermore, in parallel processing of instructions, the higher the degree of parallelism becomes, the higher is the frequency of branching instructions and the more manifest becomes the overhead.
As a conventional technology for countering this overhead, there is a conditional execution approach, according to which information indicating execution conditions is added to the instructions, and the operations indicated by the instructions are executed only when those conditions are satisfied. With this approach, condition flags corresponding to the execution conditions added to the instructions are referenced at execution time, and if the conditions are not fulfilled, then the execution result of the instruction is invalidated, that is, it is executed as a no-operation instruction.
For example, when the process flow including the conditional branch shown in FIG. 10 is notated in a format adding to the instructions information indicating an execution condition, then a program as shown in FIG. 11 results. In FIG. 11, C0 and C1 represent the conditions that are added to the instructions, and if the value of the condition flags corresponding thereto is true, then the instructions are executed, whereas if it is false, then the instructions are executed as no-operation instructions. In this example, first the comparison result of instruction 1 (comparison instruction) is stored in C0. At the same time, C1 is set to a condition that is opposite that of C0. Consequently, the operation of either instruction 2 or instruction 3 is actually executed, whereas the other one is executed as a no-operation instruction. As a result, a branching process is unnecessary, and the overhead of the branching process is countered.
In the above-described conventional conditional execution approach, if the condition is not satisfied, the corresponding instruction is performed as a no-operation instruction, and the operation is effectively not executed. Consequently, even though the two instructions are notated in parallel and use two computing units, actually only one computing unit can be effectively utilized in practice. As a result, there is the problem that the effective performance is lower than one would expect for the degree of parallelism with which the program is notated.