Synchronization barriers (barriers) are a primitive used in parallel computing which allow the programmer to guarantee that all threads have finished one phase of their work before allowing any thread to begin the next phase. The synchronization barrier implements a function with the property that no thread will return from the call before all threads have entered it. To illustrate, consider an example where the programmers has several threads running the same code, all processing Work1( ), and does not want any thread to begin Work2( ) until all threads have finished Work1( ). With a barrier, the code would look like this:
Work1( );
Barrier( );
Work2( );
Here, as threads arrive at the Barrier( ) call, they pause (either spinning or blocking) until all threads have arrived, at which point all threads are released to being Work2( ). Note that synchronization barriers should be able to be reused: it must be possible for each thread to call Barrier( ) again on the same structure after completing Work2( ).
These primitives are frequently used in scientific and mathematic computing and other highly parallel workloads. In systems where there are not synchronization barrier primitives, developers who require this functionality are forced to implement their own barriers. This leads to duplicated work and sometimes incorrect code. Further, this is an area where value can be added by building a synchronization barrier which is not only fast, but better supports real-world usage by efficiently handling the deletion of the barrier.
In this regard, there is a need for a fast and robust primitive to replace the existing barriers with a faster implementation and to support the requirements of a high-performance synchronization barrier with support for blocking and deletion.