(1) Field of the Invention
The present invention relates to thread switching control in a processor system.
(2) Description of the Related Art
In the recent years, as the representatives of computer architectures, in addition to a CISC (Complex Instruction Set Computer) architecture designed to carry out complex processing in accordance with one instruction, there have been known an RISC (Reduced Instruction Set Computer) architecture made to simplify processing to be implemented relative to one instruction, a VLIW (Very Long Instruction Word) made to collect a plurality of simultaneously processible instructions into one long instruction through software, and other architectures.
In addition, the processing methods in a central processing unit (CPU) of a computer for realizing these architectures are roughly classified into two: in-order execution type and out-of-order execution type.
FIG. 13 is an illustration for explaining an in-order execution type processing method, while FIG. 14 is an illustration for explaining an out-of-order execution type processing method. As shown in FIG. 13, the in-order execution type is a method of conducting instruction processing according to a program, and as shown in FIG. 14, the out-of-order execution type is a method of seeing the dependence (dependent relationship) between instructions so that, in the case of an instruction having no dependence, the processing is conducted without following the program sequence.
Furthermore, in the recent years, in addition to single thread processing for carrying out one program (thread) in one processor, attention has been paid to a multithread processor system designed to physically carry out a plurality of threads in parallel in one processor.
FIGS. 15A and 15B are illustrations for explaining a multithread processor system. FIG. 15A is an illustration for explaining single thread processing, while FIG. 15B is an illustration for explaining multithread processing. FIG. 15B shows an example of multithread processing in which two programs A and B are processed in parallel in one CPU.
In general, in addition to a register visible to software and a status register (CPU status register), a CPU has resources for carrying out the addition, subtraction, multiplication, division, load processing for reading out memory data into a register and software processing for writing register data in a memory. The multithread processor is designed to multiplex registers visible to software in one CPU so that a plurality of programs share an instruction execution resource for addition/subtraction or the like while implementing separate programs.
As a method of realizing the above-mentioned multithread processing, in addition to a fine grained multithreading method or simultaneous multithreading (SMT) method (see FIG. 16) which carries out a plurality of threads simultaneously, there has been known a coarse grained multithreading method or vertical multithreading (VMT) method (see FIG. 17) which is designed to make the switching to a different thread and implement it in the case of the occurrence of an event such as a cache miss without carrying out a plurality of threads simultaneously (for example, see Japanese Patent Application Laid-Open No. 2002-163121).
FIG. 16 is an illustration for explaining the SMT method, while FIG. 17 is an illustration for explaining the VMT method.
The VMT method is for covering up the cache-miss instruction processing which requires a long time, and it is designed to, in the case of the detection of cache miss, make the switching to a different thread and carry out the thread in an execution unit or control unit (both are not shown) with respect to the processing other than a memory access while a cache control unit (not shown) conducts the processing to bring data from a memory to a cache. Moreover, in this VMT method, with respect to threads in which cache miss is hard to develop, the switching to a different thread is made when a given period of time elapses.
Meanwhile, for example, in a program to be executed by a multiprocessor, for the purpose of synchronizing the processing among the processors, there can be included a code called a spin-loop for continuously monitoring the data (share data, monitor data) in a specified area on a memory until monitor data is changed to an expected value by a different processor.
FIG. 18 is an illustration for explaining the spin-loop. In the example shown in FIG. 18, in a multiprocessor including two processors CPU 0 and CPU 1, the spin-loop is implemented in order to establish the synchronization with the CPU 1 in the CPU 0. In this spin-loop condition, although instruction processing is conducted at all times in terms of hardware, as shown in FIG. 18, a wait (synchronization wait) condition in which processing does not advance is taken in terms of program.
FIG. 19 is an illustration of a spin-loop condition in a multithread processor. In the example shown in FIG. 19, in a multithread processor designed to carry out two threads 0 and 1 in parallel, the thread 0 implements the spin-loop for establishing the synchronization with the thread 1.
In the spin-loop, until the data on a memory is changed, a processor which does not carry out the multithread processing does nothing except continuously monitoring monitor data. On the other hand, a multithread processor designed to implement the multithread processing is required to conduct other thread processing.
In the monitor processing on a memory (monitor data), since a cache miss does not occur usually, in the multithread processor, once the memory monitor starts, not until a given period of time elapses, the thread switching takes place. The processing which does not advance (meaningless) continues for the meantime. When a great deal of processor processing time is spared for such meaningless processing, the performance of the processor degrades and the completion of the synchronization between the threads delays. That is, the wait condition in the spin-loop interferes with the other thread processing.
So far, for enhancing the processing efficiency in the multithread processor, there has been known a new-program-code employing method, such as tuning, in which, for giving an instruction to the processor as to that the thread (program) is in a wait condition and the priority is placed on the execution of the other thread, for example, an instruction (program code) for lowering the priority of the thread which is presently in execution is added to an instruction set and an instruction is newly inserted into a portion of the thread which is in a wait condition.
However, although this conventional thread switching control in a processor is effective to a newly developed program or recompile-possible program, difficulty is encountered in employing it for recompile-impossible program or change-impossible program, such as in the case of the loss of a program source.