The present invention relates to a multi-processor system executing a plurality of threads simultaneously and an execution method therefor.
Many multi-thread execution methods have been proposed for exploiting a parallelism as a program execution method, in which a program is divided into several instruction streams called “threads”.
According to the aforementioned multi-thread execution method, a thread is generated by “a fork operation”. The thread which executes the fork operation is referred to as “a parent thread”, and a newly generated thread is referred to as “a child thread”. The thread is terminated after it completes a defined operation. That is, lots of threads are forked and terminated in a repeated manner in a program running process.
A thread is assigned to a thread processor. In a system with several physical thread processors, several threads can be executed at the same time, and thus the execution performance is enhanced relative to sequential processing. Moreover, by assigning several threads to each thread processor, it is possible to force a currently running thread into a standby state and start another thread, in cases of a synchronization failure, a resource conflict, or a cash miss-hit occurs. Thus, possible delays caused by those events can be apparently hidden behind, and efficiency of resource uses can be enhanced.
The following typical references describe a method of multi-thread execution:                1. R. S. Nikhil et al: “A Multithreaded Massively Parallel Architecture” (Proceedings of the 19th Annual International Symposium on Computer Architecture, pp. 156-167, May 1992); and        
2. D. E. Culler et al: “Fine-grain Parallelism with Minimal Hardware Support: A Compiler-controlled Threaded Abstract Machine” (Proceeding of Fourth International Conference on Architectural Support for Programming languages and Operating System, pp. 164-175, Apr. 1991).
However, when the frequency of the operation of thread generation/termination and change/restoration into/from a standby-state is higher than that of the other processes, the efficiency of entire system depends on that in thread scheduling of thread generation/termination, change/restoration into/from a standby-state and so on. Therefore, it is important for efficient parallel processing to decrease the overhead relevant to the thread scheduling. In a program with less parallelism, processing amount for one thread may be small. In this case, thread execution cost and thread scheduling cost become severe.
On the other hand, thread executing methods, in which a fine grain of thread is dealt a small processing amount, and an apparatus thereof have been proposed. For example, one of the parallel processors using fine grains of thread has been introduced in ‘Gurinder S. Sohi et al: “Multiscalar Processor” (The 22nd International Symposium on Computer Architecture, IEEE Computer Society Press, 1995, pp. 414-425)’. In the multiscalar processor given in the reference, one program is divided into tasks each as a set of elementary blocks, and a processor, which has a capability of parallel-processing, processes them.
FIG. 23 gives a configuration example of the multiscalar processor. The multiscalar processor has a sequencer 73, a processing unit 74, an arbitration network 75 and data banks 76. Several processing units 74 are included in the system. Each of the processing units 74 has a command cache 77, an execution unit 78 and a register file 79. Several data banks 76 exist, according to each of processing units 74. Each of data banks 76 is made up of an ARB (Address Resolution Buffer) 80, a data cache 81 and so on. The parallel operation control of several tasks is dynamically or statically made by sequencer 73 using task descriptors, which include control flow information, and each processing unit 74 is assigned to a task. When sequencer 73 assigns tasks to processing units 74, it examines the task descriptors so that next task will be assigned.
Further, a similar method of thread execution is given in the following reference: ‘Pradeep K. Dubey et al: “Single Speculative Multi-threading (SPSM) Architecture: Compiler-assisted Fine-Grained Multi-threading” (Parallel Architecture and Compilation Techniques, IFIP 1995).
FIG. 24 shows a general idea on thread execution in an SPSM architecture. With reference to FIG. 24, a thread generation command 84 and a thread queuing command 85 are both embedded in a single program 82. When thread generation command 84 is executed, a future thread 83, which will execute in a forestalled manner, is generated. Future thread 83 executes until thread queuing command 85, and queues until an original process execution catches up with the future thread execution, and then merges the result from future thread execution with that from the original process execution. Future thread 83, which executes in parallel to the original process execution, is generated for the speed-up of the program execution.
However, in the aforementioned thread execution method, when a fork command is executed or a thread execution is started, and when no thread processor in a standby-state exists, a thread can not be generated, and proper program execution may not be encountered. Moreover, even though thread generation is stopped until a thread processor becomes ready, new thread generations continue to be made from running threads. Thus, there are shortcomings such as an increase in the number of threads. Furthermore, information on a corresponding relation among threads needs to be stored, and thread management must be performed in one place in a parallel system. Thus, there are shortcomings in that the number of thread processors is limited by constraint in the thread manager, and it is difficult to develop the thread manager by hardware because the thread management method itself is complex.