The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Graphics Processing Units (GPUs) are increasingly being utilized for both (1) graphics processing, and (2) general purpose computing, of which a field known as GPGPU (General-purpose computing on graphics processing units) has been established. A constraint associated with GPUs is the total amount of memory, registers, and so on, that are available for use by threads, or groups of threads operating on a same shader or kernel (e.g., a WARP, a WAVEFRONT). As an example of a kernel (e.g., a compute kernel), a portion of code can be included in a loop, and a multitude of threads can execute the same portion of code in parallel until the loop completes. Accordingly, code that is parallelizable can be sped-up through use of such kernels. Additionally, the portion of code can include a barrier instruction, indicating that code beyond the barrier instruction is not to be executed until all threads in a group of threads (e.g., the multitude of threads) have reached the barrier instruction. Accordingly, the threads in the group of threads can be synchronized, and executions subsequent to the barrier instruction which may depend on computed information prior to the barrier instruction, can be assured to have access to valid information.