Some processors, such as CPU (Central Processing Unit), process a single instruction called “SIMD (Single Instruction Multiple Data) instruction” to execute same type of calculations in parallel for different data. Such processors for executing an SIMD instruction include registers called “SIMD register”, which store a combination of different data that is to be processed in parallel. For example, if a processor receives an SIMD instruction ts1+ts2, when an SIMD register ts1 stores data A(1) and A(2), and an SIMD register ts2 stores data B(1) and B(2), then the processor executes two additions A(1)+B(1) and A(2)+B(2) in parallel.
In one method for generating a code including an SIMD instruction, an SIMD instruction is generated by extracting two or more instructions of a same calculation type that is executable in parallel from among a plurality of non-SIMD instructions, and combining the extracted instructions. For example, some compiler apparatuses, which transform a source code described in a high-level language to a machine-readable object code, combine two or more instructions to generate an SIMD instruction for the purpose of optimization. The number of combinable instructions (i.e., SIMD width) varies depending on the architecture of a processor.
According to one proposal, a compiler apparatus executes the following processing to transform a code to an SIMD instruction (i.e., SIMD transformation). This compiler apparatus estimates an execution time for each candidate of instruction combination. For example, when a first combination and a second combination are extracted as candidates of instruction combination, the compiler apparatus estimates an execution time for each of the first and second combinations. Then, the compiler apparatus selects a combination whose estimated execution time is shorter from the first combination and second combinations, in order to transform a code to an SIMD instruction. See, for example, Japanese Laid-open Patent Publication No. 2013-80407.
In the meantime, some codes executed by a compiler apparatus include a loop. A loop repeatedly executes a same arithmetic expression including a loop variable, changing the value of the loop variable (for example, incrementing the value of the loop variable by one).
In this execution, calculation of an m-th iteration of a loop and calculation of an n-th iteration of the same loop are, in some cases, executable in parallel by means of an SIMD instruction. For example, arrays A and B and a loop variable J are used to describe an arithmetic expression A(J)=A(J)+B(J) in a loop. Here, calculation of Jth iteration A(J)=A(J)+B(J) and calculation of J+1th iteration A(J+1)=A(J+1)+B(J+1) are independent from each other. Accordingly, the compiler apparatus can transform a code to an SIMD instruction, so as to calculate A(J) and A(J+1) in parallel. In this case, the number of instructions executed in a loop is reduced in about half.
On the other hand, calculation of an m-th iteration of a loop and calculation of an n-th iteration of the same loop are, in some cases, inexecutable in parallel. For example, when an arithmetic expression A(J)=A(J−1)+B(J) is described in a loop, calculation of J+1th iteration A(J+1)=A(J)+B(J+1) refers to calculation result of Jth iteration A(J)=A(J−1)+B(J). Accordingly, if A(J) and A(J+1) are calculated in parallel, its result might be different from the result obtained when A(J) and A(J+1) are calculated sequentially. Hence, in conventional compiler apparatuses, if calculation of an m-th loop iteration and calculation of an n-th loop iteration have dependency, an SIMD instruction is not used to optimize a process.