Parallel computing allows a computational problem to be decomposed into multiple tasks. These multiple tasks are then carried out by a plurality of processes or threads which may operate concurrently. Parallel computing may allow a computational problem to be solved in a shorter amount of time by utilizing the computational resources of a plurality of processors. Parallel computing may also allow large computational problems to be solved that may not be practical to solve using conventional computing systems and methods. With currently available parallel computing systems, it is possible to harness the computational resources of hundreds or thousands of computer processors to run hundreds or thousands of concurrent processes.
Typically, there are interdependencies between at least some of the concurrent processes. In order to avoid a condition where one process races too far ahead of another interdependent process (which may cause an indeterminate computational result), it is often necessary to incorporate a process synchronization mechanism, such as a barrier synchronization point. A barrier synchronization point is one of the most widely used synchronization operations in parallel applications. In the most common format of barrier, each participant, which can be either a process, thread, or a task (a set of processes or tasks), makes a barrier call to register that it reached a particular point of the program it is associated with. The participant then blocks in the barrier call until all participants make the call. In other words, none of the participants can pass the barrier if any of participants have not called the barrier.
One problem with many barrier synchronization methods is that they require a completer barrier even if a partial barrier is sufficient. A complete barrier requires that each and every participant check into the barrier to complete the synchronization and every participant leave the barrier afterwards, while a partial barrier does not require this. In a partial barrier, the synchronization is among a subset of participants and another subset of participants exits the barrier in response to the synchronization among the first subset. The first subset and the second subset have at least one participant in common. One such scenario is a hierarchical synchronization among participants running on multiple nodes and multiple participants running on at least one of the nodes. In this hierarchical synchronization, participants running on the same node first synchronize among themselves, often through shared memory. Then some of the participants are designated to be representatives of all participants on the node to participate in inter-nodal synchronization.
These representatives need to synchronize among themselves once a while if more than one is desired for better inter-nodal performance. Once the inter-nodal synchronization is complete, the representatives need to notify non-representatives the complete of the entire synchronization among all participants. All the synchronization among subsets of participants can be done using complete barrier. One problem with this barrier synchronization method and other methods that use complete barriers for partial barriers is that the cost of a complete barrier is higher than a partial barrier. These methods are inefficient and have high shared memory barrier costs which reduce the benefit of using multiple representatives per node.
Therefore a need exists to overcome the problems with the prior art as discussed above.