A computer system may include two or more interconnected processors that execute multiple program threads concurrently. Such a multiprocessor computing system allows a computing task to be completed more quickly by dividing the task into smaller tasks which are performed concurrently by the various processors.
A multiprocessor system may include a number of processors formed on separate computer chips which are then assembled into a multiprocessor system. Such systems are also referred to as parallel computing systems. Increasingly, computer chips are being manufactured that include multiple processors (or “cores”) on a single computer chip. Such single-chip multiprocessors tend to occupy less space and to better facilitate communication between the processors than traditional multiprocessor systems.
In either case, synchronization is typically required among the threads executed by a multiprocessor system. For example, interdependencies among the threads may require that one or more threads produce a result before one or more other threads make use of the result. Thus, it may be necessary to utilize a synchronization barrier in which each thread to be synchronized is allowed to execute until it reaches a barrier, the thread is stalled at the barrier until the other threads reach the barrier; and, then, the threads are allowed to resume execution.
Conventional synchronization techniques may require that a counter is shared by all of the threads to be synchronized. The counter is incremented by each thread that arrives at the barrier. Each thread repeatedly tests whether the counter has reached a given value to determine whether the other threads have also reached the barrier. Because each thread contends for access to it, the counter may become a hot spot. To lessen contention for the counter, the threads may test less often; however, this may increase the time required for each thread to learn that the barrier has been reached.
Alternatively, a synchronization mechanism can be implemented by specialized hardware. However, this can complicate the design and manufacture of multiprocessor system that includes such specialized hardware and can limit flexibility of the system.