At present, a pipeline processing technique is used in many computer processors. In this technique, processing of an instruction is divided into a plurality of stages such as “Fetch,” “Decode,” “Execute,” and “Memory access” so that processing in these different stages is executed in parallel. Namely, while a certain instruction is being processed in a certain stage (for example, in the Fetch stage), another instruction is processed in another stage (for example, in the Decode stage) in parallel.
Ideally, it is preferable that more instructions be executed in the pipeline processing so that processing is constantly executed in all the stages. However, various reasons could create idle stages and decrease the utilization rate of the pipeline processing. For example, the utilization rate is decreased when a program includes a branch instruction representing conditional branching. When a branch instruction is executed, depending on the execution result thereof, either the instruction of the next address is selected without executing a jump (not-taken) or the instruction of a distant address is selected by executing a jump (taken). The next instruction to be executed after a branch instruction is not determined until the branch instruction is processed through the “Execute” stage. If feeding the next instruction into the pipeline processing is halted until the execution result of the branch instruction is obtained, at least one idle stage is created.
In addition, some processors adopt a branch prediction technique. In the branch prediction, a branch prediction circuit arranged as a hardware module in a processor collects history information that indicates the execution results of the past branch instructions. When the processor executes any one of these branch instructions next, the branch prediction circuit predicts a branch direction for this branch instruction (taken or not-taken) on the basis of the history information. The processor selects and feeds the next instruction into the pipeline processing on the basis of the prediction made by the branch prediction circuit, without waiting for the execution result of the current branch instruction being processed in the pipeline processing (speculative execution).
If the branch prediction succeeds, the processor simply proceeds with the pipeline processing. Thus, creation of an idle stage is prevented. However, if the branch prediction fails, the processor needs to delete the instruction and the subsequent instructions fed on the basis of the prediction from the pipeline processing. The processor needs to feed an appropriate instruction, instead. Namely, there is a penalty for a prediction error. Consequently, even if the branch prediction technique is adopted, when a program including many branch instructions is executed, the utilization rate of the pipeline processing could be decreased. Thus, the execution efficiency of the program could be decreased.
In such circumstance, there have been proposed techniques for obtaining a program having fewer branch instructions through conversion in an optimizing compiler. For example, assuming that a source code includes an if-else statement defining: if c is true, a value t is assigned to a variable v; and if c is false, a value f is assigned to a variable v, a compiler according to one proposed technique converts the if-else statement into the following assignment statement: v=(t and c) or (f and not c).
In addition, assuming that a source code includes an if-else statement defining that result=5*data or result=7*data is executed on the basis of a mask bit string “mask,” a compiler according to another proposed technique converts the if-else statement into the following assignment statement: result=(5*data and mask) or (7*data andc mask). Namely, the compiler generates a program defining that: both instructions in the if clause and the else clause are executed; AND of the execution result of the if clause and the mask bit string and ANDC of the execution result of the else clause and the mask bit string are each calculated; and the results of the both calculations are combined by OR.
See, for example, the following documents: Japanese Laid-open Patent Publication No. 2003-202991 Japanese Laid-open Patent Publication No. 2010-186467
For example, in one type of branch structure, a program includes consecutive branch instructions, each of which indicates the same jump destination and causes a processor to determine a branch direction on the basis of a result of a comparison operation between integers. For example, there is a program defining that certain processing is executed if a variable c matches any one of constants c1, c2, etc. Characters are represented by integers in character mode in computers. Therefore, a processor executes a program defining that certain processing is executed if a value of a character variable s matches any one of constant s1, s2, etc. in the same way as a comparison operation between integers.
It is possible to cause an optimizing compiler to convert such a branch instruction group into an instruction group having fewer branch instructions. However, if a branch instruction group is converted into an instruction group, the number of instructions obtained could significantly be increased by the conversion. Thus, conventional optimizing compilers do not convert such a branch instruction group as described above into an instruction group having fewer branch instructions. In addition, whether a branch instruction group is converted into an instruction group having a higher execution efficiency depends on the architecture of the processor (target processor) that executes the program.