1. Field of the Invention
The present invention relates to a parallel processor, parallel processing method, and storing medium for storing the routine of the method in a computer readable format.
2. Description of the Related Art
A single processor running on Unix® or another operating system (OS) must function to manage the progress in a plurality of programs simultaneously existing in a local memory when executing programs under a multi-tasking environment. In such a function, use is made of the concept of a “process” as opposed to the term “program”. A “process” is an independent program in execution in a memory space (user memory space) which that program can independently access set in a local memory. Execution of a program means running of a process, while termination of the program means deletion of the process. Also, a process is capable of running and deleting other processes and communicating with other processes.
Since there is one central processing unit (CPU) in a single processor, a maximum of one process can be run at any one time. Therefore, in a single processor, the user memory space is simultaneously assigned to a plurality of independent programs and the plurality of programs are alternately executed in a time sharing mode to alternately run a plurality of processes and thereby realize a multi-tasking environment.
At this time, when one process is in a running state, the other processes are in a waiting state.
In the above multi-tasking environment, a plurality of processes pass messages among each other as described below.
Namely, in a single processor, as explained above, since there is a maximum of one process in a running state at any one time, when one process sending a message is in a running state, another process to receive the message is in a waiting state. Therefore, the running state process sending the message calls up a process management task in a kernel of the OS and writes to send the message in a table in a memory which stores the previous running state of the process to receive the message immediately before it shifted to the waiting state (i.e., normally a table storing context of threads). Then, when the process to receive the message next shifts the running state, it learns that the message was received by referring to the table and performs processing in accordance therewith. On the other hand, for example, when a process is one which proceeds to the next processing conditional on receiving a message and judges that no message was received when shifting to the running state and referring to the table, that process enters the waiting state. That process shifts to the running state only after confirming the receipt of a message.
On the other hand, for example, in a multiprocessor which is comprised of a plurality of CPUs connected via a common bus and executes a plurality of mutually independent programs in parallel, usually a maximum of one process is in a running state at one CPU at any one time, but a plurality of processes can simultaneously be in the running state at different CPUs.
Communication between processes is achieved for example by a sending side process passing a message over the common bus and an arbiter monitoring the common bus notifying that message to the receiving side process based on instruction codes indicated in the user program (i.e. an application program). Therefore, to pass a message between processes, it is necessary that both the message sending side process and receiving side process be in the running state.
In this way, in a multiprocessor, usually messages are not passed using the process management task as in the above explained single processor. That is, there is no process management task in a multiprocessor.
In a multiprocessor, however, when it is necessary to synchronize a plurality of processes operating in parallel, the synchronization is realized by using the above message passing.
Below, a method of synchronizing processes in a multiprocessor of the related art will be explained.
First, the configuration of a general multiprocessor will be explained.
FIG. 5 is a view of the configuration of a general multiprocessor.
As shown in FIG. 5, a multiprocessor 1 is configured by connecting, for example, four processor elements 111 to 114 via a common bus 17. The common bus 17 is connected to a common memory 15 and an arbiter 16.
Here, the processor element 111 comprises, for example as shown in FIG. 6, a processor core 31 and a local memory 32, stores a user program read from the common memory 15 via the common bus 17 in the local memory 32, and successively supplies instruction codes of the user program stored in the local memory 32 to the processor core 31 for execution. The processor elements 112 to 114 have the same configuration, for example, as the processor element 111.
The arbiter 16 monitors execution states (such as the load of the processing) of the processor elements 111 to 114 and assigns software resources stored in the common memory 15 to the processor elements 111 to 114, that is, the hardware resources. Specifically, the arbiter 16 reads the user programs stored in the common memory 15 into the local memories 32 shown in FIG. 6 of the processor elements 111 to 114.
The arbiter 16, for example as shown in FIG. 7, reads a main program Prg_A and subprograms Prg_B, Prg_C, Prg_D, and Prg_E as user programs into the local memories 32 of the processor elements 111 to 114 indicated by the arrows in FIG. 7 at the same time or at different times.
Next, a method of synchronizing among programs or processes of the related art in the multiprocessor 1 shown in FIG. 5 will be explained. First, the main program Prg_A stored in a common memory 15 is read into the local memory 32 of the processor element 111 by the arbiter 16, then, as shown in FIG. 8, instruction codes written in the main program Prg_A are successively executed in the processor element 111.
Next, when the instruction code “gen(Prg_B)” is executed in the processor element 111, a message indicating that is notified to the arbiter 16 via the common bus 17. Then, the subprogram Prg_B stored in the common memory 15 is read into the local memory 32 of the processor element 112, by the arbiter 16 based on the execution states of the processor elements 111 to 114, and instruction codes written in the subprogram Prg_B are successively executed in the processor element 112.
Next, when an instruction code “gen(Prg_C)” is executed in the processor element 111, a message indicating that is notified to the arbiter 16 via the common bus 17. Then, the subprogram Prg_C stored in the common memory 15 is read into the local memory 32 of the processor element 113 by the arbiter 16 based on the execution states of the processor elements 111 to 114, and instruction codes written in the subprogram Prg_C are successively executed in the processor element 113.
Next, when an instruction code “gen(Prg_D)” is executed in the processor element 111, a message indicating that is notified to the arbiter 16 via the common bus 17. The subprogram Prg_D stored in the common memory 15 is then read into the local memory 32 of the processor element 114 by the arbiter 16 based on the execution states of the processor elements 111 to 114, and instruction codes written in the subprogram Prg_D are successively executed in the processor element 114.
Next, when an instruction code “wait(Prg_D)” is executed in the processor element 111, the processing of the processor element 111 enters a synchronization waiting state.
Next, when the last instruction code “end” of the subprogram Prg_D is executed in the processor element 114, a message indicating the completion of the subprogram Prg_D is notified to the processor element 111 via, for example, the arbiter 16. As a result, the processor element 111 releases the synchronization waiting state and executes the next instruction code.
Next, when an instruction code “wait(Prg_C)” is executed in the processor element 111, the processing of the processor element 111 enters a synchronization waiting state.
Next, when the last instruction code “end” of the subprogram Prg_C is executed in the processor element 113, a message indicating the completion of the subprogram Prg_C is notified to the processor element 111 via, for example, the arbiter 16. As a result, the processor element 111 releases the synchronization waiting state and executes the next instruction code.
Next, when an instruction code “gen(Prg_E)” is executed in the processor element 111, a message indicating that execution is notified to the arbiter 16 via the common bus 17. Then, the subprogram Prg_E stored in the common memory 15 is read into the local memory of, for example, the processor element 114 by the arbiter 16 based on the execution states of the processor elements 111 to 114, and instruction codes written in the subprogram Prg_C are successively executed in the processor element 114.
Next, when the instruction code “gen(Prg_D)” is executed again in the processor element 111, a message indicating that execution is notified to the arbiter 16 via the common bus 17. Then, the subprogram Prg_D stored in the common memory 15 is read into the local memory 32 of the processor element 113 by the arbiter 16 based on the execution states of the processor elements 111 to 114, and instruction codes written in the subprogram Prg_D are successively executed in the processor element 113.
Summarizing the problems to be solved by the invention, as explained above, in the multiprocessor 1 of the related art, the synchronization between the programs processes executed in different processor elements is a simple one of release of a synchronization waiting state caused by execution of an instruction code “wait” in one processor element based on execution of an instruction code “end” indicating completion of execution of a program in another processor element.
Namely, a synchronization waiting state of a processor element based on one program cannot be released until the completion of execution of a program in another processor element. Accordingly, there is a disadvantage that a variety of forms of synchronization among different programs executed at different processor elements such as synchronization among instruction codes written in the middle of programs cannot be realized.
Also, in the above embodiment, the arbiter 16 cannot for example determine which subprogram will be called up in the future by a main program Prg_A shown in FIG. 8 during execution of the main program Prg_A by the processor element 111.
Therefore, as shown in FIG. 8, there is a possibility that the subprogram Prg_D will end up being assigned to different processor elements 113 and 114 by the arbiter 16 between a first execution and a second execution of the instruction code “gen(Prg_D)” in a processor element 111. In such a case, although the subprogram Prg_D is executed again after a relatively short interval, it is necessary to read the subprogram Prg_D from the common memory 15 to the processor element 113 at the time of the second execution, which results in a longer waiting time of the processor element 113.
Such a situation frequently occurs especially when the memory capacity of the local memory shown in FIG. 6 and the size of the program to be read are of the same order and causes a drastic decline of performance of the multiprocessor 1.