Higher speed processing is demanded for a CPU (Central Processing Unit). And for this, the processing of a CPU has been improved using various technologies. The methods used for this purpose are pipeline processing, a superscalar system which performs parallel processing, and an out-of-order execution system which executes instructions having completed input data with priority, without executing according to the sequence assigned to the program instructions.
The out-of-order execution system is a technology to improve performance of a CPU by executing the subsequent instruction first when data required for the first instruction is not completed and data required for the subsequent instruction processing is completed (e.g. see Patent Document 1).
For example, in the case of the processing instructions in the sequence written in a program, if a first instruction processing 1 is an instruction involving memory access, and a subsequent instruction processing 2 is an instruction which does not involve memory access, then the instruction 2 is executed in parallel with the memory access of the instruction processing 1, and the instruction processing 1 is executed after executing the instruction processing 2.
Another multi-thread system for improving processing of a CPU by allowing not a single program, but a plurality of programs to run, has been proposed (e.g. see Patent Document 2).
In this multi-thread system of allowing a plurality of programs to run, by providing a plurality of sets of programmable resources a CPU, it is equivalent to operate a plurality of CPUs when viewed from a software point of view. Therefore a plurality of programs can be executed.
FIG. 16 is a block diagram of a conventional CPU. The CPU has a main storage 111, instruction cache 112, instruction decoder 113, reservation station 114, computing execution unit 115 and architecture register 116. This reservation station 114 reads operand data required for executing functions from the architecture register 116, and controls computing processing of the computing execution unit 115 and main storage operand address generation processing.
The architecture register 116 temporarily stores operand data, and is required to have a read/write speed equivalent to the computing speed as a part of the CPU. Therefore the architecture register 116 is constructed not by memories but by huge registers. For example, the architecture register 116 is constructed by register files that can be installed at high density by a fewer number of transistors.
One example of this multi-thread system is a VMT (Vertical Multi-Threading) system. According to this system, only one program can run at a time, but programs can be switched when a long wait time for data is generated, or when a predetermined interval time elapses. For the circuit amount used for a VMT system, programmable resources are provided for the number of programs, but the circuit amount to be added to run one program is little, which is easily implemented.
In the case of FIG. 16, the register file 116 is constructed by an architecture register for each thread, and one architecture register is set to “active”, and the other architecture register is set to “sleep” according to the switching of the programs, and operand data is read from an architecture register corresponding to the running program.
Another example of a multi-thread system is a simultaneous multi-thread system (SMT system) which allows a plurality of programs to run simultaneously. Since a plurality of programs run simultaneously, circuit control becomes more difficult and resources increase, compared with the case of allowing a single program to run, but circuits can be efficiently used since a plurality of programs run at the same time.
In the case of this simultaneous multi-thread system as well, architecture registers for a plurality of threads are constructed by register files, and operand data of a corresponding thread is read from these architecture registers for the plurality of threads.    Patent Document 1: Japanese Patent Application Laid-Open No. 2007-87108    Patent Document 2: Published Japanese Translation of PCT application No. 2006-502504 (WO 2004/034209)
As described above, in the case of the simultaneous multi-thread system, architecture registers for a plurality of threads are constructed by register files, and a plurality of programs are allowed to run simultaneously, hence the circuit amount to select the architecture registers increases in order to read operand data required for executing functions, compared with the a single thread system. Also wiring amount of the circuits may increase when operand data in different threads are read simultaneously.
Therefore, compared with the case of a single thread, it is difficult to improve read frequency of the register file. This means that improving performance of a CPU to increase computing speed is difficult, even if the out-of-order system and simultaneous multi-thread system are used.