1. Field of the Invention
The present invention relates to a renaming apparatus for setting a correspondence between a physical register number and a logical register number, and a processor in which a high-frequency operation is required, such as a microprocessor and a digital signal processor (DSP).
2. Related Background Art
When a frequency of a processor is raised in order to obtain a high performance, the number of pipeline stages inevitably increases, and a pipeline bubble (pipeline stall) increases. Software pipelining is generally performed in order to remove the bubble.
FIG. 18 is a diagram showing a pipeline stage of a certain processor, and FIG. 19 is a diagram showing one example of a program executed by the processor of FIG. 18.
It is assumed that during execution of the program of FIG. 19 by the processor of FIG. 18, there is no data bypass, and data can be read in a stage D after writing of the data into a register in a stage W.
The processor has an interlock function of one issue and can execute a pipeline processing. This processor executes the program of FIG. 19 in accordance with a procedure shown in FIG. 20. The data is written into a register r1 in Wa (fifth cycle) of instruction 1, and written into a register r2 in Wb (sixth cycle) of instruction 2. For instruction 3, the data of the registers r1, r2 are read in a stage Dc, and the processor is interlocked (stalled) for three cycles. This is a pipeline bubble due to dependence on data, and the bubble is denoted with “*” in FIG. 20. In an example of FIG. 20, there are six pipeline bubbles.
Similarly, the data written into a register r3 in a stage Wd of instruction 3 is read in a stage De of instruction 4. Instruction 5 is simply a branch instruction. This instruction, which has no dependence on other instructions, is soon executed in the next cycle.
Additionally, for simplicity, it is assumed in FIG. 20 that the branch instruction of a loop has no penalty (no branch delay slot), and the address of a load/store instruction can be incremented in one system.
To solve the pipeline bubble shown in FIG. 20, overlap execution of each loop is performed. FIG. 21 is a diagram showing one example of the overlap execution. In an example of FIG. 21, only one pipeline bubble is generated in three loops. In the example of FIG. 20, six pipeline bubbles per loop are generated. Therefore, the number of bubbles can drastically be decreased to 1/18.
To raise efficiency of instruction throughput, a programmer primitively constructs and realizes software pipelining. The program is shown, for example, in FIG. 22.
In an example of FIG. 22, registers r1, r2, r7, a1, a2, a7 are used in loop 1, registers r3, r4, r8, a3, a4, a8 are used in loop 2, and registers r5, r6, r9, a5, a6, a9 are used in loop 3.
As described above, in the example of FIG. 22, since different registers are used in the respective loops, the number of instructions of the program is tripled. Moreover, the programmer has to designate the registers in such a manner that the same register is not conflicted in the respective overlapped loops.
The following two conventional techniques for solving this intricacy have been proposed. A first technique is a processor which has an out-of-order issue function and automatic register renaming function. In this processor, an ideal operation is performed as shown in FIG. 21, in accordance with program description of FIG. 19 as it is. However, there is a problem that scale of hardware for realizing the out-of-order issue function becomes enormous.
A second technique is a register rotation function mounted in a computer Itanium or Cydrome of Intel K.K. An example of FIG. 22 is realized in register rotation as shown in FIG. 23.
With an instruction branch_regrot, a jump instruction of the loop is executed, and simultaneously the registers are rotated. In the rotation, every three registers are replaced. The program is shown in FIG. 24, and pipeline operation is shown in FIG. 25.
A correspondence of the registers at the time when a first loop is finished is shown in FIG. 26. Moreover, the correspondence of the registers at the time when a second loop is finished is shown in FIG. 27.
A conventional automatic renaming function has a problem that hardware is complicated. Moreover, a conventional register rotation function has a low flexibility, and has a problem that the function cannot be applied to rotations other than a simple rotation.