Graphics processing units (GPUs) are often used to execute program code including single instruction multiple data (SIMD) instructions. SIMD instructions perform the same operation on multiple data points simultaneously. Additionally, GPUs can also execute program code having a single program multiple data (SPMD) programming model, in which SIMD code is mapped to multiple kernel instances (e.g., work items), which are each executed simultaneously within a given hardware thread. Multiple kernel instances can be associated with a single hardware thread. The number of kernel instances per-thread may be referred to as the SIMD-width of the kernel. Each SPMD kernel instance can appear to execute serially and independently within its own SIMD lane. In actuality, each thread executes a SIMD-width number of kernel instances concurrently.
For a given SIMD-width, if all kernel instances within a thread are executing the same instruction, the SIMD lanes can be maximally utilized. However, if one or more of the kernel instances chooses a divergent branch, then the thread can execute the two paths of the branch separately in a serial manner, which is known as serialization.