Conventional parallel processing architectures support the execution of multiple threads. Particular operations that are performed during the execution of a program using a conventional parallel processing architecture may require synchronization of the multiple threads. Barrier instructions (or fence instructions) are used to synchronize the execution of multiple threads during execution of such a program. A scheduling unit within the parallel processing architecture recognizes the barrier instructions and ensures that all of the threads reach a particular barrier instruction before any of the threads executes an instruction subsequent to that particular barrier instruction. In some cases, it is not necessary to synchronize all of the threads at the barrier instruction and execution of the threads that do not require synchronization is unnecessarily delayed.
Thus, there is a need for addressing the issue of scheduling the execution threads to process barrier instructions and/or other issues associated with the prior art.