The present invention relates to a processor in which a register number specified in an instruction is translated into the number of a physical register to be actually used in the execution of the instruction and the register number translation is changed dynamically at run time; and more particularly, the invention relates to a technology of register number translation, whereby a processor carries out a register renaming operation at a high speed.
If translation of a register number specified in an instruction into the number of a physical register to be actually used in the execution of the instruction by a processor is changed dynamically at run time, the following two effects are obtained.
First, it is possible to have more physical registers than the number of registers which can be specified by an instruction. Recently, with progress in device technology, it is easy to provide a register file that is a collection of many physical registers in a processor. However, it is not so easy to raise the number of registers that can be specified by at instruction to keep up with the increase in physical register count. This is because, in most processors, the length of the instruction is limited so that the number of bits used to specify the number of a register in the instruction is also limited as well. In addition, unless upward compatibility with an instruction with a small number of bits used for, specifying registers that can be used is sustained, upward binary compatibility of software can not be sustained as well. Thus, if a register number specified by an instruction can be translated into the number of a physical register to be actually used in the execution of the instruction with the resulting physical register number varying from time to time in accordance with a predetermined rule set in advance, it is not necessary to limit the number of physical registers to the number of registers which can be specified by an instruction even if the number of specified registers is unchanged.
Second, it is possible to use different physical registers in a sequence of the same instructions. Two examples of the second effect are described as follows.
The first example of the effect is seen in loop iteration processing. For example each time a loop iteration is completed, a register number specified in an instruction is translated into the number of a physical register which is actually used in the execution of the instruction and the specified register may be different from that used in the immediately preceding loop iteration. Since different physical registers are used in the loop iterations, the use of a physical register in a loop iteration does not block execution of other loop iterations. As a result, loop iterations can be executed concurrently. The concurrent execution of loop iterations has a great effect on the superscalar technology developed in recent years to implement parallel execution of instructions.
The second example of the effect is seen in operations to save and restore the contents of registers in a subroutine call. When a subroutine is called, the contents of registers are normally saved by storing them in a memory. As the processing of the subroutine is completed, the contents stored in the memory are read out and restored back to the registers. By allocating different physical registers to be actually used in the execution of a called subroutine to numbers indicating registers specified by instructions in the subroutine at the time the subroutine is called, the subroutine will use physical which are different from those used by the calling routine so that it is possible to eliminate operations to save and restore the contents of the registers used in the calling routine. At the end of the processing of the called subroutine, the allocation of the numbers of physical registers to register numbers specified in instructions is restored back to the allocation specified prior to the subroutine call.
By allowing the allocation of the numbers of physical registers to register numbers specified in instructions to be changed dynamically at run time as described above, the effects explained above can be obtained. FIG. 9 is a diagram showing a typical conventional processor implementing this type of dynamic translation. In the processor shown in FIG. 9, the instruction code of a typical instruction is transferred from an instruction fetch unit 10 to an instruction decoder 11. The instruction decoder 11 decodes the instruction code to generate a control signal 900 for controlling an instruction execution unit 13. As a result of the decoding, the instruction decoder 11 also generates the number 901 of a register to be read out. The number 901 of the register to be read out is specified in the instruction. The number 901 of the register to be read out is added to a window pointer 90 by a read register number adder 910 to produce a number 902 of a physical register from which data is to be actually read out. By the same token, the number 903 of a write register is also specified in the instruction. The number 903 of the write register is added to the window pointer 90 by a write register number adder 911 to produce a number 904 of a physical register into which data is to be actually stored. The instruction execution unit 13 reads out the contents of the physical register indicated by the number 902. The physical register indicated by the number 902 is typically included in a register file 15. The instruction execution unit 13 executes the instruction in accordance with the control signal 900, using the contents of the physical register indicated by the number 902 in the execution of the instruction. A result of the execution of the instruction is stored in the register indicated by the number 904. The physical register indicated by the number 904 is typically included in the register file 15 as well. As described above, the register numbers 901 and 903 specified in the instruction are translated respectively into the numbers 902 and 904 of registers to be actually used in the execution of the instruction. In order to change the results of translation, a special slide instruction is executed to generate a window change signal 905 for changing, the value of the window pointer 90.
Processor technologies developed in recent years include a superscalar technology and, particularly, an out of order execution technique which exhibit great effects. The superscalar technology allows a plurality of instructions to be executed concurrently as described earlier. With the out of order execution technique, the original execution order of instructions in a program is changed by rearranging the instructions. In the case of the out of order execution technique, in particular, the allocation of physical registers to be actually used is changed dynamically at run time so as to prevent the execution of an instruction from being blocked by the current use of the physical registers in other instructions. Referred to as a register renaming technique, the method to dynamically change the allocation of physical registers to be actually used in execution of instructions at run time increases the processing efficiency of the processor.
However, it is difficult to apply the register renaming technique as it is to a processor like the one in FIG. 9 wherein a register number specified in an instruction is translated into another number of a physical register to be actually used in the execution of the instruction. This is because, in an operation to read out data from a register or write data into a register according the register renaming technique, the number of a renamed register allocated at run time is used, making this technique inapplicable to a configuration like the one shown in FIG. 9. As for a superscalar computer not adopting the out of order execution technique, a technology to increase the processing efficiency by providing three window pointers is effective as is disclosed in Japanese Published Unexamined Patent Application Nos. Hei 5-20010 and Hei 9-325888. However, this technology can not be applied to processors adopting the out of order execution technique or the register naming technique.
As an extension of the out of order execution technique, there is adopted a speculative execution technique to execute an instruction to be executed after a conditional branch instruction is executed prior to completion of formation of a judgment as to whether or not the flow of the execution branches and to execute an instruction following a preceding instruction with a probability of the preceding instruction""s being interrupted. In the case of a branch instruction, a branch destination is predicted and an instruction at the branch destination is executed speculatively. In the case of an instruction with a probability of the instruction""s being interrupted, an immediately succeeding instruction is executed speculatively on the assumption that the instruction is not indeed interrupted. In the case of an instruction executed speculatively in accordance with the speculative execution technique, a result of the execution is generally stored in a rename register. The result is discarded in case the branch operation has been predicted incorrectly or an interrupt does occur in opposition to the assumptions.
Changing the value of the window pointer in a processor having a configuration like the one shown in FIG. 9 in order to implement the speculative execution technique is a big problem. For example, in the configuration shown in FIG. 9, the value of the window point 90 must be changed by execution of a special slide instruction in order to change the translation of a register number specified in an instruction into the number of a physical register to be actually used in the execution of the instruction. If the translation is changed by speculative execution of the special slide instruction, however, the value of the window pointer 90 will remain modified as it is even if the speculative execution has to be canceled in case a branch operation has been predicted incorrectly or an interrupt has occurred.
It is an object of the present invention to provide a processor adopting a superscalar technology with a capability of dynamically changing the translation of a register number specified in an instruction into the number of a physical register to be actually used in the execution of the instruction at run time with a high degree of efficiency even in an operation to rename a register.
It is another object of the present invention to provide the processor adopting the superscalar technology with a capability of dynamically changing the translation of a register number specified in an instruction into the number of a physical register to be actually used in the execution of the instruction by speculative execution of a slide instruction.
It is still another object of the present invention to provide the processor adopting the superscalar technology with a capability of nullifying a result of speculative execution of a slide instruction to dynamically change the translation of a register number specified in an instruction into the number of a physical register to be actually used in the execution of the instruction in case the speculative execution has to be canceled.
In order to solve the problems described above, the processor provided by the present invention comprises:
a register number translation unit for translating a register number specified in an instruction into the number of a physical register; and
a register rename unit for further replacing the number of a physical register with a rename register number temporarily,
wherein, after the register number translation unit translates a register number specified in an instruction into the number of a physical register, the register rename unit replaces the number of the physical register with a rename register number.
With the above configuration, a register number specified in an instruction is first translated into the number of a physical register. Thus, by merely providing a register rename unit for replacing the number of a physical register; with the number of a rename register naming can be implemented easily even if a register number specified in an instruction and the number of a physical register are different from each other.
In addition, the register number translation unit is included in an instruction decoder for decoding an instruction and determining what is to be executed for the instruction prior to execution of the instruction. Since a register number specified in an instruction is translated into the number of a physical register before execution of the instruction in this configuration, there is neither a need to suspend execution of an instruction nor a penalty caused by translation of a register number.
Moreover, there is provided a slide instruction for changing the translation of a register number specified in an instruction into the number of a physical register in the register number translation unit. Furthermore, there is provided an immediate register number translation update means which is used for immediately changing translation of a register number specified in an instruction into the number of a physical register in case a result of decoding the instruction output by the instruction decoder indicates that the instruction is a slide instruction.
The processor provided by the present invention is characterized in that,
the instruction decoder decodes a plurality of instructions concurrently;
the instruction decoder is provided with:
a first register number translation unit for outputting a result of translation obtained prior to execution of a slide instruction for each of the decoded instructions; and
a second register number translation unit for outputting a result of translation obtained after execution of a slide instruction for each of the decoded instructions; and
the immediate register number translation update means has a transformation result switching means which is used for switching a result of register number translation from an output of the first register number translation unit for an instruction preceding a slide instruction to an output of the second register number translation unit for an instruction succeeding the slide instruction in case a result of decoding an instruction output by the instruction decoder indicates that the instruction is a slide instruction.
In the configuration described above, when a result of decoding an instruction output by the instruction decoder indicates that the instruction is a slide instruction, the slide instruction can be executed immediately to change translation of a register number specified in an instruction succeeding the slide instruction into the number of a physical register without the need to re-decode instructions being decoded concurrently with the slide, instruction. It should be noted that, also in this configuration, two or more slide instructions may be detected among a plurality of instructions decoded concurrently. In this case, instructions following the second slide instruction are re-decoded and, in the mean time, settings of numbers of physical registers obtained as a result of translations by the first and second register number translation units are updated.
On the other hand, it is possible to provide an alternative configuration including only one register number translation unit in place of the configuration employing a pair of register number translation units as described above. In the alternative configuration, when a slide instruction is detected, the slide instruction is executed to update translation of a register number specified in an instruction into the number of a physical register. The alternative configuration includes a means for detecting an instruction which is decoded concurrently with the detected slide instruction, appears in the original program instruction sequence after the slide instruction and uses a register. Instructions following the detected instruction using a register are re-decoded and register numbers specified in the re-decoded instructions are translated into numbers of physical registers. This configuration is capable of easily keeping up with a need to set the range of updating called a slide width, that is, the magnitude of a change for updating translation of a register number specified in an instruction into the number of a physical register by using the slide instruction itself with a high degree of freedom.
The processor provided by the present invention is also characterized in that,
the instruction decoder speculatively decodes an instruction before execution of the instruction is confirmed and, if the speculatively decoded instruction is a slide instruction, the immediate register number translation update means speculatively updates translation of a register number specified in an instruction into the number of a physical register; and
there is provided a register number translation update canceling means which is used for canceling updating of translation of a register number specified in an instruction into the number of a physical register done speculatively by the immediate register number translation update means in case the slide instruction is canceled.
The register number translation update canceling means comprises:
a register number translation recording means for recording a translation of a register number specified in an instruction into the number of a physical register obtained prior to speculative updating carried out by a slide instruction; and
a register number translation restoring means which is used for restoring a translation of a register number specified in an instruction into the number of a physical register to a translation recorded by the register number translation recording means in case the slide instruction is canceled.
Other characteristics of the present invention will become apparent from the following description of embodiments of the present invention.