The present invention relates to synchronization of concurrently running processes in a data processing system.
Parallel computing allows a computational problem to be decomposed into multiple tasks. These multiple tasks are then carried out by a plurality of processes which may operate concurrently. Parallel computing may allow a computational problem to be solved in a shorter amount of time by utilizing the computational resources of a plurality of processors. Parallel computing may also allow large computational problems to be solved that may not be practical to solve using conventional computing systems and methods. With currently available parallel computing systems, it is possible to harness the computational resources of hundreds or thousands of computer processors to run hundreds or thousands of concurrent processes.
Typically, there are interdependencies between at least some of the concurrent processes. In order to avoid a condition where one process races too far ahead of another interdependent process (which may cause an indeterminate computational result), it is often necessary to incorporate a process synchronization mechanism, such as a barrier synchronization point. Multiple threads or processes may then come to the barrier synchronization point, and wait until all of the other concurrent processes have arrived. Once synchronized in this manner, the processes may then continue with their execution separately. Thus, barrier synchronization is considered to be one of the most important mechanisms in parallel processing. This is reflected in published shared memory parallel programming standards, such as OpenMP™, in which combined parallel work-sharing constructs have implicit barrier synchronization (although these implicit barriers may be turned off if necessary by a no-wait clause).
Different designs for barrier synchronization have been proposed. For example, an IBM Technical Disclosure Bulletin entitled “Barrier Synchronization Using Fetch-and-Add and Broadcast”, 34(8):33-34, 1992, describes a fetch-and-add operation to decrease an established counter, and then have waiting threads monitor the counter to determine when they may proceed. As another example, U.S. Pat. No. 6,330,619 issued to Kreuzberg describes use of different memory words for different synchronization states without using special hardware instructions.
While these methods provide possible barrier synchronization solutions, it is desirable to develop a system and method for barrier synchronization having increased performance.