The present invention relates to a tightly coupled multi-processor system, and more particularly to a system for distributing a plurality of processes which are dynamically generated from a program, into a plurality of processors for efficiently executing the program under a multi-job environment.
As one approach to reducing an execution time of a program, there is known a method of dividing the program into a plurality of processes and executing them in parallel by a plurality of processors.
The following method is generally used for dividing a program into processes and executing them in parallel. Namely, a sequentially executed part of the program is executed as one process. When the processing reached a parallel executable part of the program, a plurality of child processes are generated from the part for parallel execution. After the execution of all the child processes have been completed, the next sequentially executed part is executed. A parallel executable part of the generated child process may be further divided into grandchild processes. In this case, nesting of parallel processing is allowed.
For reducing the execution time of the paralleled program, it is necessary to efficiently distribute the generated processes to multi-processors. To this end, there are known a system in which the child processes which are dynamically generated as the program is executed are registered in a queue. In such a system, a processor having no job checks the queue, and if any child process is registered in the queue, it is fetched to be executed.
Conventional distribution and communication of processes via an operating system cannot realize a high speed operation of a paralleled program. In order to distribute the processes at high speed, there has been proposed a method wherein there is performed in a problem mode the registration of a child process into a queue, the check of the queue, and the fetch of the child process.
Furthermore, in order to communicate between the processes at high speed, there is known an active wait method wherein occurrence of an event is waited in a spin loop while referring to a change in flag is checked in the problem mode. (refer to "Active Wait", U.S. Pat. No. 4,631,674).
An event to be waited in the spin loop includes such as a generation of a child process, and completion of all generated child processes (refer to "Microtasking on IBM Multiprocessors", IBM J. RES. DEVELOP. Vol. 30, No. 6, November, 1986).
A conventional technique will be illustratively described with reference to FIG. 15. Atstep 1001, a parent process is temporarily interrupted to wait for completion of all child processes. At step 1002 it is checked whether or not there is any new child process. If there is any new child process, it is executed at step 1003. Whereas, if not, it is checked at step 1004 whether or not all child processes have been executed. If all the child processes have been completely executed, the parent process is restarted at step 1005. Whereas, if not, the step 1004 is again executed such that the spin loop continues until all the child processes ar completed.
The above-described conventional technique poses no problem if there is no nesting of parallel processes. However, when nesting of the parallel processes is allowed, there arises a problem of inefficient use of processor resources in the following case. Specifically, when a processor 0 executing a parent process has proceeded to the spin loop waiting for completion of child processes, even if a processor 1 thereafter generates a new grandchild process and registers it in a process queue, the processor 0 cannot execute the grandchild process. In this case, although the grandchild process is executed later by the processor 1 and the program is ordinarily ended, there is still the problem that the processor 0 is not efficiently used and the processing time increases.
Furthermore, with the above-described conventional technique, CPU resources are occupied by a single job in the spin loop. In this case, if a paralleled program is executed under a single job environment, there is no problem. However, if it is executed under a multi-job environment, there arises a problem of a low throughput of the system. A paralleled program debugged under a multi-job environment may be executed frequently under a single job environment. It is therefore necessary to solve this problem while making a load module usable in both the single and multi-job environments.
In order to achieve a multi-job environment, a multi-tasking method is conventionally used. In this method, an OS gives one or more tasks to each job. Processes constituting the job corresponding to a task are executed under control of the task. The OS causes a processor to execute tasks given to each job in a time sharing manner. The OS determines assignment of a processor using right to each task while considering the use factors of CPU resources and input/output resources in each task. Therefore, in the case where a task 1 executing a child process loses the processor using right, a task 0 having the processor using right cannot complete the program because the parent process waits in vain for the execution end of the child process at the spin loop. This is called a deadlock state. To solve this, it is necessary for the OS to detect the deadlock state and give the task 1 the processor using right.