The execution of threads (e.g. pthreads in Apple® and Microsoft Windows® systems, and CPU threads) in a multi-threaded processor assumes a basic guarantee of forward progress; i.e., if one thread becomes blocked, other threads continue to make progress unless the other threads depend on resources owned by the blocked thread. This guarantee is necessary to support patterns extremely common in procedural parallel programming, such as locks.
The forward progress guarantee is trivially implemented by multi-core/multiple-instruction multiple-data (MIMD) processor organizations because each thread is executed independently by the hardware. On the other hand, single-instruction multiple-data (SIMD) threads, such as threads executed by a graphics processing unit (GPU) are typically not independent at a low level. Threads at the same program counter (PC) may be scheduled concurrently on the SIMD lanes. However, if threads take different paths through the program, the threads will execute at different PCs, and cannot be scheduled concurrently. Some existing schemes serialize the execution of threads that take different paths through the program. Since some SIMD lanes will be idle when threads are executing different PCs, existing schemes schedule threads in a specific order in an attempt to reduce the idle time. However, these specific scheduling orders do not necessarily provide a forward progress guarantee because scheduling priority is given to reducing idle time. When a serialized thread becomes blocked on user-level synchronization (e.g. a lock), a number of other threads also become blocked as they wait for the blocked thread to reach a common PC. In some cases, deadlock may occur and execution of the program cannot be completed. Thus, there is a need for addressing these issues and/or other issues associated with the prior art.