For current computing devices and applications, efficient multithreaded performance is becoming increasingly important. OpenMP is a popular application programming interface (API) for shared-memory parallel programming. OpenMP specifies a synchronization barrier feature, which may be used to coordinate multiple threads executing in a thread team. In general, all threads of the thread team must reach the barrier before execution of the program may proceed. OpenMP also specifies a tasking system, in which threads may create and execute tasks. All tasks must be completed before the threads may exit a synchronization barrier. Thus, tasks are often executed while threads are waiting in synchronization barriers.
Many OpenMP implementations use “work-stealing,” in which a thread may “steal” tasks to execute from another thread; that is, a thread may claim a task from another thread and run the task to completion. To be compatible with tasking requirements, OpenMP synchronization barriers are typically implemented as tree or linear barriers. However, tree barriers have a relatively longer critical path compared to non-tree barriers such as dissemination barriers.