1. Field
An aspect of the present invention relates to a processor and an information processing apparatus.
2. Description of the Related Art
A method named software pipelining is sometimes used so that compilers can improve the performance of processing programs in processers. FIG. 1 shows a concept of the software pipelining. When, for example, process A having a dependence relationship in a program source is described and hardware executes the process exactly as described in the program, process A is executed sequentially, which may reduce the performance. In this case, the compiler optimizes the order of the instruction lines upon compiling the program source so that process B, which has no dependence relationship with process A and is to be executed after process A, is inserted between the instruction lines of process A. Thereby, the hardware can execute processes A and B in parallel. This optimization is called software pipelining.
There are roughly two ways of whether to execute an instruction in accordance with a condition in the case when a program includes a conditional statement such as an “if” statement. One is a conditional branch instruction based on whether a condition is true or false, and it is switched as shown in FIG. 2 on the basis of whether or not an instruction line is to be executed. However, as shown in FIG. 2, an instruction line for which it is determined whether or not it will be executed in accordance with a condition cannot be a target of the software pipelining.
The other way to execute an instruction in accordance with a condition is a conditional register move instruction of moving data between registers in accordance with whether a condition is true or false, or a technique of using a conditional store instruction by which information is read from a register and is written to memory. These can be tools of software pipelining. These instructions determine whether or not to update a result in accordance with whether a condition is true or false. In other words, instruction lines do not change regardless of conditions, and accordingly these instructions make it possible to use software pipelining even on a portion including a conditional statement (FIGS. 3 and 4).
Some instruction sets only include definitions of conditional register move instructions, and do not include a conditional store instruction. When a single program is to be executed in a single core or a single thread in a processor using these types of instruction sets, processing is performed as below instead of using conditional store instructions. First, memory data is loaded onto a register. Thereafter, it is determined by using a conditional register move instruction whether or not memory is to be updated, and the information in that register is stored in memory by using a store instruction. If it is determined that the memory in the program should not be updated, the data that has been loaded onto the register is again written to the memory. FIG. 5 shows a flowchart for using a conditional move instruction in place of a conditional store instruction.
There is a method of increasing the speed of parallel processing by using a plurality of cores in one program (thread parallelism). When this method is executed, data in memory is sometimes shared. When a plurality of cores is used for thread parallelism, a conditional store instruction cannot be replaced with a conditional register move instruction.
The reason for this will be explained by referring to FIG. 6.
It is assumed in FIG. 6 that core 0 implements the case when a condition is false and core 1 implements the case when a condition is true, and one of the cores updates memory. If the respective cores load data from a register to the memory and core 1 first stores new data in the memory, and thereafter core 0 again stores the same data in the memory, then the data becomes old data in the memory. This causes an error in processing.
The problem shown in FIG. 6 is caused because once a core loads data into a register, that data cannot be shared by another core. In other words, the thread parallelism by which speed is increased while memory data is shared cannot be performed. This problem can be solved by using conditional store instructions.
If a conditional store instruction is used, data is not loaded onto a register, and when a condition is true, core 1 stores new data in accordance with a conditional store instruction, while when a condition is false, core 0 does not store data, and accordingly new data can be reflected in the memory.
Even when an instruction corresponding to a conditional store instruction is defined, there is an instruction set having a register dedicated to masking in addition to a floating-point register. In this case, the above thread parallelism can be performed by using the masked store. However, the dependence relationship of a mask register has to be detected, which increases the hardware size. Not all programs require masks, and therefore the increase in hardware size lowers the cost performance.
In a conditional instruction such as SPARC-V9 (SPARC is a registered trademark), ICC or XCC are used as condition codes for integers, and fcc0, fcc1, fcc2, and fcc3 are used as condition codes for floating points. When, for example, a conditional instruction is to be executed under a condition of floating points, there are only four independent conditions (i.e., fcc0, fcc1, fcc2, and fcc3), and accordingly a conditional process with five or more conditions cannot be performed. This limitation on the number of conditions that can be held also makes it impossible to realize parallelism via a compiler. FIG. 8 shows an example when the maximum number of condition codes is two.
In recent years, SIMD (Single Instruction Multiple Data) expansion processing in which data of a plurality of registers is parallelly processed as a vector element of input data of a single instruction has been used; SIMD has been adopted to improve performance of scalar processors. When SIMD processing of a conditional instruction is defined, a conditional SIMD instruction has to be executed in accordance with a certain condition. Thereby, a condition code or a mask register has to be made to be expanded in such a manner that they newly correspond to the SIMD for each element.
As described above, in a conventional conditional instruction for processors, the number of conditional processes is limited by the number of condition codes, which is problematic.