(1) Field of the Invention
The present invention relates to thread switching control in a multithread processor.
(2) Description of the Related Art
In the recent years, as the representatives of computer architectures, in addition to a CISC (Complex Instruction Set Computer) architecture designed to carry out complex processing in accordance with one instruction, there have been known an RISC (Reduced Instruction Set Computer) architecture made to simplify processing to be implemented relative to one instruction, a VLIW (Very Long Instruction Word) made to collect a plurality of simultaneously processible instructions into one long instruction through software, and other architectures.
In addition, the processing methods in a central processing unit (CPU) of a computer for realizing these architectures are roughly classified into two: in-order execution type and out-of-order execution type.
FIG. 23 is an illustration for explaining an in-order execution type processing method, while FIG. 24 is an illustration for explaining an out-of-order execution type processing method. As shown in FIG. 23, the in-order execution type is a method of conducting instruction processing according to a program, and as shown in FIG. 24, the out-of-order execution type is a method of seeing the dependence (dependent relationship) between instructions so that, in the case of an instruction having no dependence, the processing is conducted without following the program sequence.
Furthermore, in the recent years, in addition to single thread processing for carrying out one program (thread) in one processor, attention has been paid to a multithread processor system designed to physically carry out a plurality of threads in parallel in one processor.
FIGS. 25A and 25B are illustrations for explaining a multithread processor system. FIG. 25A is an illustration for explaining single thread processing, while FIG. 25B is an illustration for explaining multithread processing. FIG. 25B shows an example of multithread processing in which two programs A and B are processed in parallel in one CPU.
In general, in addition to a register visible to software and a status register (CPU status register), a CPU has resources for carrying out the addition, subtraction, multiplication, division, load processing for reading out memory data into a register and software processing for writing register data in a memory. The multithread processor is designed to multiplex registers visible to software in one CPU so that a plurality of programs share an instruction execution resource for addition/subtraction or the like while implementing separate programs.
As a method of realizing the above-mentioned multithread processing, in addition to a fine grained multithreading method or simultaneous multithreading (SMT) method (see FIG. 26) which carries out a plurality of threads simultaneously, there has been known a coarse grained multithreading method or vertical multithreading (VMT) method (see FIG. 27) which is designed to make the switching to a different thread and implement it in the case of the occurrence of an event such as a cache miss without carrying out a plurality of threads simultaneously (for example, see Japanese Patent Laid-Open No. 2002-163121).
FIG. 26 is an illustration for explaining the SMT method, while FIG. 27 is an illustration for explaining the VMT method.
The VMT method is for covering up the cache-miss instruction processing which requires a long time, and it is designed to, in the case of the detection of cache miss, make the switching to a different thread and carry out the thread in an execution unit or control unit (both are not shown) with respect to the processing other than a memory access while a cache control unit (not shown) conducts the processing to bring data from a memory to a cache. Moreover, in this VMT method, with respect to threads in which cache miss is hard to develop, the switching to a different thread is made when a given period of time elapses.
FIG. 28 is an illustration for explaining the processing at the occurrence of a cache miss in the in-order mode, and FIG. 29 is an illustration for explaining the processing at the occurrence of a cache miss in the out-of-order mode, and FIG. 30 is an illustration for explaining a conventional thread switching method in the out-of-order mode.
So far, the VMT method has been used on only the aforesaid in-order processor. In a processor made to conduct the in-order execution, an event of a cache miss occurs in a program sequence, and the response of cache miss data from a memory is made in the program sequence (see FIG. 28). On the other hand, in a processor made to conduct execution in the out-of-order mode, the memory access does not arise in an instruction sequence in a program, and as shown in FIG. 29, the cache miss event does not always occur in the program sequence.
For example, as shown in FIG. 30, in a case in which two instructions A and B in which a cache miss occurs exist on a thread X and the sequence on the thread X is the order of the instruction A and the instruction B, when the execution of the instruction B can be conducted prior to the instruction A, a cache miss of the instruction B is detected prior to the detection of a cache miss of the instruction A. For example, in the example shown in FIG. 30, if the cache miss of the instruction B is detected and the switching is made from the thread X to another thread Y before the occurrence of the cache miss of the instruction A, the cache miss of the instruction A occurs after the resumption of the execution of the thread X.
Incidentally, in the case of the in-order execution type processor, since the execution of the instruction B starts after the start of the execution of the instruction A, the cache miss occurs in the order of the instruction A and the instruction B.
There is a problem which arises with the conventional VMT type multithread processing, however, in that, since the thread switching is made whenever a cache miss occurs, the frequency of the thread switching increases, which leads to inefficient processing. Thus, there is a need to achieve the thread switching efficiently for increasing the processing speed.