1. Field of the Invention
The present invention relates to register control for switching between threads in a multithread processor.
2. Description of the Related Art
In the recent years, as the representatives of computer architectures, in addition to a CISC (Complex Instruction Set Computer) architecture designed to carry out complex processing in accordance with one instruction, there have been known an RISC (Reduced Instruction Set Computer) architecture made to simplify processing to be implemented relative to one instruction, a VLIW (Very Long Instruction Word) made to collect a plurality of simultaneously processible instructions into one long instruction through software, and other architectures.
In addition, the processing methods in a central processing unit (CPU) of a computer for realizing these architectures are roughly classified into two: in-order execution type and out-of-order execution type.
FIG. 8 is an illustration for explaining an in-order execution type processing method, while FIG. 9 is an illustration for explaining an out-of-order execution type processing method. As shown in FIG. 8, the in-order execution type is a method of conducting instruction processing according to a program, and as shown in FIG. 9, the out-of-order execution type is a method of seeing the dependence (dependent relationship) between instructions so that, in the case of an instruction having no dependence, the processing is conducted without following the program sequence.
Furthermore, in the recent years, in addition to single thread processing for carrying out one program (thread) in one processor, attention has been paid to a multithread processor system designed to physically carry out a plurality of threads in parallel in one processor.
FIGS. 10A and 10B are illustrations for explaining a multithread processor system. FIG. 10A is an illustration for explaining single thread processing, while FIG. 10B is an illustration for explaining multithread processing. FIG. 10B shows an example of multithread processing in which two programs A and B are processed in parallel in two CPUs.
In general, in addition to a register visible to software and a status register (CPU status register), a CPU has resources for carrying out the addition, subtraction, multiplication, division, load processing for reading out memory data into a register and software processing for writing register data in a memory. The multithread processor is designed to multiplex registers visible to software in one CPU so that a plurality of programs share an instruction execution resource for addition/subtraction or the like while implementing separate programs (for example, see Japanese Patent Laid-Open No. 2003-241961).
As a method of realizing the above-mentioned multithread processing, in addition to a fine grained multithreading method or simultaneous multithreading (SMT) method (see FIG. 11) which carries out a plurality of threads simultaneously, there has been known as a coarse grained multithreading method or vertical multithreading (VMT) method (see FIG. 12) which is designed to make the switching to a different thread and implement it in the case of the occurrence of an event such as a cache miss without carrying out a plurality of threads simultaneously.
FIG. 11 is an illustration for explaining the SMT method, while FIG. 12 is an illustration for explaining the VMT method.
The VMT method is for covering up the cache-miss instruction processing which requires a long time, and it is designed to, in the case of the detection of cache miss, make the switching to a different thread and carry out the thread in an execution unit or control unit (both are not shown) with respect to the processing other than a memory access while a cache control unit (not shown) conducts the processing to bring data from a memory to a cache. Moreover, in this VMT method, with respect to threads in which cache miss is hard to develop, the switching to a different thread is made when a given period of time elapses (time-sharing system).
However, for realizing the multithread processing, there is a need for a plurality of threads to share a decoder, arithmetic unit and others in a CPU, and there is a need to multiplex (combine) registers visible to software for each thread. This increases the number of registers to be handled, which enlarges the area corresponding to the registers and delays the register reading processing. Moreover, there is a need to additionally use a selecting circuit for handling the registers which increase in number, which causes a complicated circuit arrangement, thereby likewise delaying the register reading processing.