1. Field of the Invention
The present invention relates to an interprocessor synchronization system for parallel computers which connects a plurality of independently operated processor elements (hereinafter referred to as PE, PEs) through a communication network and enables them to operate synchronously in parallel; and, more particularly, to a detecting system in a parallel computer for detecting a synchronization, a status or a stable state of the PEs.
2. Related Art
With the development of semi-conductor technology, it has become possible to develop a high performance microprocessor and a large capacity memory built in a smaller configuration at low cost, thereby enabling a parallel computer to be produced easily by using a number of such microprocessors and memories. To efficiently process a single job with a plurality of PEs in parallel, a job can be divided into some steps each allotted to a respective PE prior to execution. For the parallel execution of a job, the processing sequence of these steps must be carefully considered. That is, it must be ensured that every PE finishes processing one step before proceeding to the next step. To accomplish this, an efficient inter-processor synchronous operation which realizes a high speed parallel computer is required.
A first related art for operating a parallel computer is a memory sharing method.
The above described inter-processor synchronous operation can be performed with PEs having a shared memory, a part of which is exclusively used for reading and writing.
The second related art is a synchronization register method where a synchronization register is provided for each PE, the logical product of the outputs of synchronization registers of all PEs is detected, and then the result is returned to all PEs, thereby detecting the synchronous state of all PEs.
The third related art is a state detecting method. In this method, as shown in FIG. 1, in each PE of a parallel processing system, a process 1 is executed, then a message is sent/received between PEs as a process 2. When the message is completely processed, the PEs proceed to the next step. At this time, if all PEs are waiting for a message (that is, no PEs are in execution) and no messages exist in the network (this state is referred to as a stable state of all PEs), a host processor recognizes this state, and broadcasts a command to all PEs, in response to which each PE starts processing a corresponding command. Thus, each PE can be informed of the state and proceed to the next proceeding step.
There is a problem with the memory sharing method in that when the number of PEs is substantially increased, a memory is too frequently accessed.
In the synchronization register method, there are no conflicts of accessing a shared memory since a synchronization register can be accessed independently in each PE. In the synchronization mechanism of this method, no problems occur when all PEs are operating normally in the process 1 as shown in FIG. 2A. In this case, all PEs can proceed to the step 2 after the synchronous register detects that the synchronization occurs after all PEs have completed the process 2. When there are any abnormal conditions in the process 1 (for example, bugs in a program, a division by zero, overflow, etc.) two countermeasures are taken in the related art technology. One of them is, as shown in FIG. 2B, to ignore an error by a PE in the process 1, detect the synchronous state with the other synchronization register, and proceed to the process 2. The other countermeasure is, as shown in FIG. 2C, to interrupt the process 1 in all PEs when any error of any PE is detected in the process 1. In the former, there is a problem that an error, if one occurs, is broadcast so late that the process 1 is executed in vain because all PEs, other than the PE in error in the process 1, proceed to the process 2 without being informed of the error. Further, an incorrect result may be achieved due to ignorance of the error. In the latter case, on the other hand, all PEs other than the PE in error in the process 1 get into the synchronous waiting state, as the error is not informed of all PEs except the PE in error, and the synchronization process for the other PEs is not completed as the synchronous signal is not sent to them from the PE in error, thus preventing them from proceeding to the process 2.
In the system of the related art technology, the synchronization request or status request cannot be completely detected when any messages remain in the network. When the message does not exist in respective PEs, but remains in a communication path between PEs, and if the synchronization request or the status request issues, the process advances to the next step. Then, a problem arises that a particular PE processes in the next step the data to be processed in the previous step, when the data which remains in a network arrives at the particular PE. That is, when all PEs are in the state of the synchronization establishment, one PE recognizes it earlier than other PEs and possibly proceeds to the next step.