1. Statement of the Technical Field
The present invention relates to the field of parallel computing and more particularly to the use of a barrier wait synchronization in a parallel computing application.
2. Description of the Related Art
In the field of parallel computing, a barrier synchronization point refers to a state where multiple independently acting processes arrive at a common position at different times. Each of the processes can wait for each other process until all of the participating processes arrive at the common position. Once all of the processes have arrived at the common position, the processes can be released to continue separate execution without regard to the state of each other of the processes.
Barrier synchronization is a programming technique typically used to separate different “phases” of an application program. Given the ability of barrier synchronization techniques to coordinate the independent execution of different processes, barrier synchronization remains one of the most important mechanisms known in the art of parallel programming. In fact, not only does the literature of the art support such a notion, but also the use of barrier synchronization techniques have been reflected in the well-known shared memory parallel programming standard, OpenMP and its different language extensions.
Barrier synchronization has been implemented according to several well-known methods. In a first typical barrier synchronization method, referred to as “fetch-and-add”, a fetch-and-add hardware instruction can be employed to decrement an established counter. Following the decrement operation, the waiting processes can be scheduled. In a second typical barrier synchronization method, different memory words can be used for different synchronization states without using special hardware instructions.
In yet a third methodology, the fetch-and-add concept can be replaced with a distributed counter, with elements of the local positioned locally to each executing process. A series of local sensors, each which is positioned locally to each executing process, each can be monitor the counters of all other processes. Each locally positioned sensor can indicate to the corresponding process when it is appropriate to leave the barrier and to continue processing.
Notably, by using distributed counters rather than a machine-specific fetch-and-add operation, the overhead of managing conflicts for accessing a single shared variable can be avoided. Notwithstanding, coordinating the operation of multiple, local sensors has proven to be resource expensive in terms of required interconnected network traffic.