Modern graphic processors include an array of cores, referred to as execution units (EUs) that process instructions. A set of instructions comprises a kernel. Kernels are dispatched to the GPU in the form of multiple threads. The GPU processes the threads of the kernel (e.g., execute the instructions corresponding to the kernel) using the EUs. Often GPU's process the threads in parallel using multiple EUs at once.
Many kernels, particularly kernels corresponding to encoded display data contain dependencies between threads in the kernel. Said differently, execution of some of the threads in the kernel must wait for the threads from which they depend to be executed before their own execution can be started. As such, only a subset of the total number of threads in a kernel can be executed by a GPU in parallel.
Conventionally, a GPU executes a kernel by dispatching those threads without any dependencies first and those with dependencies last. This is sometimes referred to as wavefront dispatching. However, as will be appreciated kernels that have a substantial amount of spatial thread dependency will often experience reduced parallelism when dispatched according to wavefront dispatch methodologies. It is with respect to the above, that the present disclosure is provided.