1. Field of the Invention
The present invention relates to a parallel processor system for dividing a single program into a plurality of threads and executing the above program in parallel by a plurality of processors, and more particularly to a thread ending method in the individual processors.
2. Description of the Related Art
As a method of processing a single program in parallel by a parallel processor system, there is a multi-thread executing method of dividing a program into instruction flow called as thread and executing the above program by a plurality of processors in parallel. As the articles describing this method, there are Japanese Patent Publication Laid-Open (Kokai) No. Heisei 10-27108 (hereinafter, referred to as article 1), “Suggestion of On-Chip Multiprocessor-Oriented Multi-Stream Control Architecture MUSCAT” (pp. 229–236, papers of Joint Symposium on Parallel Processing JSPP97, Information Processing Society of Japan, May 1997) (hereinafter, referred to as article 2), Japanese Patent Publication Laid-Open (Kokai) No. Heisei 10-78880 (hereinafter, referred to as article 3), “Processor Architecture SKY by Using Instruction Level Parallel among Threads of Nonnumeric Calculation Program” (pp. 87–94, papers of Joint Symposium on Parallel Processing JSPP98, Information Processing Society of Japan, June 1998) (hereinafter, referred to as article 4), “Multiscalar Processor” (G. S. Sohi, S. E. Breach and T. N. Vijaykumar, The 22nd International Symposium on Computer Architecture, pp. 414–425, IEEE Computer Society Press 1995) (hereinafter, referred to as article 5). The conventional technique described in these articles will be described below.
In a general multi-thread executing method, to generate a new thread on another processor is said as “fork a thread”, a thread on the side of performing the fork operation is called as a parent thread, a newly generated thread is called as a child thread, a position to fork a thread is called as a fork point, and a head position of a child thread is called as a fork destination address or a staring point of a child thread. In the articles 1 to 4, a fork instruction is inserted in a fork point in order to instruct the thread fork. A fork destination address is specified in the fork instruction, a child thread starting from the fork destination address is generated in another processor by the execution of the fork instruction, and the execution of the child thread is started. Further, an instruction called as a term instruction for ending the processing of a thread is prepared, and each processor ends the processing of the thread by executing the term instruction. In the article 4, this term instruction is called as END instruction.
FIG. 8 shows an outline of the processing of the multi-thread executing method. FIG. 8(a) shows a single program divided into three threads A, B, and C. When a single processor processes the program, one processor PE sequentially processes the threads A, B, and C, as illustrated in FIG. 8(b). On the contrary, in the multi-thread executing method in the articles 1 to 5, one processor PE1 executes the thread A, so to generate the thread B in the other processor PE2 according to the fork instruction embedded in the thread A, while the processor PE1 is executing the thread A, and then the processor PE2 executes the thread B, as illustrated in FIG. 8(c). The processor PE2 generates the thread C in the processor PE3 according to the fork instruction embedded in the thread B. The processors PE1 and PE2 end the processing of the threads according to the term instructions embedded just before the starting points of the respective threads B and C, and when the processor PE3 executes the last instruction of the thread C, it executes the next instruction (generally, a system call instruction). As mentioned above, by simultaneously executing threads in parallel by a plurality of processors, the performance can be improved, compared with the serial processing.
As the other conventional multi-thread executing method, as illustrated in FIG. 8(d), there is a multi-thread executing method in which the processor PE1 executing the thread A performs a plurality of times of fork, so to generate the thread B in the processor PE2 and the thread C in the processor PE3 respectively. Contrary to the model of FIG. 8(d), the multi-thread executing method, as illustrated in FIG. 8(c), which is restricted to only one generation of an effective child thread according to a thread during its existence, is called as a Fork-Once Parallel Execution model. The Fork-Once Parallel Execution model can simplify the thread management greatly and a thread controller can be realized in hardware on a realistic hardware scale. Since in the individual processors, the other processor of generating a child thread is restricted to one processor, a parallel processor system with the adjacent processors connected with each other in a single direction like a ring, can execute a multi-thread. The present invention is assumed to use this Fork-Once Parallel Execution model.
When forking a child thread, it is necessary to inherit register values from the parent thread to the child thread. As for this register-value inheritance, generally, there are two methods. One method is, as adopted in the parallel processor systems of the articles 1 to 3, that only the content of a register file at a fork time of the parent thread is inherited and that the register values updated after fork are not inherited. The other method is, as adopted in the parallel processor systems of the articles 4 and 5, that the register values updated after fork are also to be inherited. The former is called as a fork-time register-values transfer method, and the latter is called as an after-fork register-values transfer method.
In the MUSCAT described in the article 2, a lot of exclusive instructions, such as a synchronization instruction between threads, for flexibly controlling the parallel operations of threads are provided.
In the above-mentioned conventional parallel processor system, it is necessary to describe a term instruction just before a starting point of a child thread without fail, in order to end a thread in the individual processors. Since one term instruction is required for every one thread, the ratio of the term instruction in the whole instructions becomes greater in a finer grain-sized thread including fewer instructions. Since the term instruction is stored in an instruction memory and becomes an object to be fetched similarly to the other instructions, there is a problem of deteriorating the performance caused by an increase of the hardware amount in the instruction memory and an increase of the number of instruction fetches.