To accelerate synchronization among processors in parallel programs, hardware synchronization mechanisms have been used. Reference may be had to, for example, Japanese Laid-open Patent Publication No. 2005-316679. Prior to synchronization operation, hardware synchronization mechanisms must be set up to reflect the location of processors that participate in synchronization. FIG. 16 is a diagram for explaining a conventional hardware synchronization mechanism. An apparatus is explained that accelerates synchronization among processors in a node.
As depicted in FIG. 16, the apparatus that accelerates synchronization among processors in a node is configured of four processors 50 (A to D), a mask-information retaining unit 60, a state retaining unit 70, and a processor-to-processor synchronization control unit 80. The mask-information retaining unit 60 is a 4-bit register corresponding to four processors, retaining information for specifying any processor 50 that participates in synchronization. The state retaining unit 70 is a 4-bit register corresponding to four processors, retaining information for specifying any processor 50 that makes a synchronization request. Upon receiving a synchronization request from any processor 50, the processor-to-processor synchronization control unit 80 updates the information in the state retaining unit 70. Also, the processor-to-processor synchronization control unit 80 compares the state retaining unit 70 and the mask-information retaining unit 60 to detect completion of synchronization, and then notifies the processor 50 of the completion.
Next, the case is explained where a process on the processor A and a process on the processor C participate in synchronization.
(1) Setting a Mask and Initializing a Synchronization State
The information in the mask-information retaining unit 60 is initialized to “0101”, reflecting the location of the processes on the processors A and C. On the other hand, the information in the state retaining unit 70 is initialized to “0000”.
(2) Processor A Issues a Synchronization Request
From the processor A to the processor-to-processor synchronization control unit 80, a synchronization request is issued. The processor-to-processor synchronization control unit 80 updates the state retaining unit 70 from “0000” to “0001”, by changing the value of a status bit corresponding to the processor A. Then, the processor-to-processor synchronization control unit 80 compares the updated information “0001” in the state retaining unit 70 and the information “0101” in the mask-information retaining unit 60 to determine that synchronization has not yet ended.
(3) Processor C Issues a Synchronization Request
From the processor C to the processor-to-processor synchronization control unit 80, a synchronization request is issued. The processor-to-processor synchronization control unit 80 updates the state retaining unit 70 from “0001” to “0101”, by changing the value of a status bit corresponding to the processor C. The processor-to-processor synchronization control unit 80 then compares the updated information “0101” in the state retaining unit 70 and the information “0101” in the mask-information retaining unit 60 to determine that synchronization has ended.
(4) Processors are Notified of Completion
Form the processor-to-processor synchronization control unit 80, according to the mask-information retaining unit 60, the processor A and the processor C that participate in synchronization are notified of completion of synchronization.
As described above, a determination as to whether synchronization of processes that participate in synchronization has been completed is made by comparing the information in the mask-information retaining unit 60 and the information in the state retaining unit 70. Therefore, to determine the completion correctly, process location information (mask) indicative of in which processor the processes that participate in synchronization must be initialized to the value reflecting the location of processors before a synchronization request is issued.
Meanwhile, in recent years, a system that executes parallel programs becomes large in size, including several hundreds to several thousands of nodes. For this reason, for example, when process location information (mask) about processes that participate in synchronization is generated at one of nodes that execute parallel programs, a burden of cost in transmission and reception for exchanging information among nodes is disadvantageously enormous.
In parallel programs, such as message passing interface (MPI), synchronization in an arbitrary subset of processes in a job and dynamically changing a set of processes for synchronization in a job are allowed. However, in the conventional hardware synchronization mechanism described above, it is in practice difficult to handle such operations for synchronization because process location information needs to be set in advance in the mask-information retaining unit 60 before a synchronization request is issued.