A variety of techniques are available to boost the processing speed of computing devices such as central processing unit (CPU) or processor. These techniques include, for example, pipelining, superscalar, and out-of-order execution of instructions. Superscalar architecture enables concurrent execution of multiple computational operations. Out-of-order execution techniques execute instructions of a program, not necessarily in the order that they are specified in the program, but in the order that their input data is available. In connection with the out-of-order processing, a technique known as the register renaming is used to complete instructions in the programmed order.
Parallelism of processing may be exploited to speed up the CPU. One such approach is the single instruction multiple data (SIMD) architecture, which permits a single instruction to perform the same operation on multiple data in a parallel fashion. As an example of this type of processing technique, there is a device having first and second data computation circuits. According to decoding results of a single instruction code (operation code), operand data values are read out of first and second memory areas in the same specified address and subjected respectively to the first and second data computation circuits.
SIMD processing is implemented with multiple hardware resources such as registers and arithmetic logic units to store and process two or more data values simultaneously. In the case of, for example, non-SIMD instructions that handle a single 64-bit data value at a time, their processing results are stored in registers with a width of 64 bits. In contrast, to store the outcomes of two-way SIMD instructions that handle two data values at a time, the destination SIMD registers have to be 128-bit wide since those instructions produce two 64-bit data values.
Such SIMD features, when implemented in a CPU, enable the CPU to handle a greater amount of data at a time, and thus increase its processing speed. In the example mentioned above, SIMD processing can handle 128-bit data with a single instruction, whereas non-SIMD processing needs two instructions to achieve the same.
For example, the following literature provides background information in this technical field:    Japanese Patent No. 3452771    Japanese Laid-open Patent Publication No. 10-228376    Japanese Laid-open Patent Publication No. 11-175339
From the viewpoint of software compatibility and hardware resource usage, however, it is impractical to code all processing operations of CPU in the form of SIMD programming. Rather, the CPU has also to execute non-SIMD processing operations, i.e., data processing that does not necessitate SIMD features. While an SIMD-capable CPU may execute non-SIMD processing as well, the efficiency of its internal register usage may be degraded in some cases as will be discussed below.
The aforementioned CPU with two-way SIMD capabilities employs SIMD registers with a bit width of 128 bits, which is twice the width of normal registers. This SIMD CPU may also perform non-SIMD processing by using half the width of SIMD registers, leaving the other half unused.
The SIMD processing may be implemented together with the register renaming technique mentioned earlier. Suppose, for example, that an SIMD register is subjected to register renaming during the execution of an instruction, and then the subsequent instruction requests the value of that SIMD register. In this case, the requested value should be read, not from the specified SIMD register, but from another register actually holding the value because of the register renaming. To make such operation possible, the register renaming operation should also be adapted to the wide width of SIMD registers, whether it is for SIMD processing or for non-SIMD processing. This means that non-SIMD processing uses only half the bit width of SIMD registers also for renamed registers.