Conventional parallel processing architectures support the execution of multiple threads. Particular operations that are performed during the execution of a program using a conventional parallel processing architecture may require synchronization of the multiple threads. Barrier instructions (or fence instructions) are used to synchronize the execution of multiple threads during execution of such a program. A scheduling unit within the parallel processing architecture recognizes the barrier instructions and ensures that all of the threads reach a particular barrier instruction before any of the threads executes an instruction subsequent to that particular barrier instruction. The multi-threaded processing unit that executes the threads is configured to synchronize the threads at the particular barrier instruction. The multi-threaded processing unit may be configured to execute the synchronized threads either in parallel or serially. In some cases, all of the synchronized threads may not be executed in parallel, such as when the barrier is used to delineate an ordered critical code section. However, serial execution of the threads reduces performance.
Thus, there is a need for addressing the issue of processing barrier instructions and/or other issues associated with the prior art.