Many modern computing systems are capable of performing parallel computations by executing multiple computations simultaneously. Such computing systems may divide a large computational task into multiple smaller calculations that are solved simultaneously in multiple processing cores. Typically, parallel applications contain both serial and parallel regions, resulting in global barriers following parallel regions, in which multiple parallel threads are executed simultaneously.
A situation where multiple parallel threads complete at different times prior to reaching such a global barrier results in one or more of the threads waiting for the other threads to complete before further processing can continue. This may be due to any of a multitude of factors, including inter-core manufacturing process variation, micro-architectural and scheduling effects, load imbalance inherent to the application, or interference from other applications or the operating system that affects threads unequally.