Current graphics processing units (GPUs) issue and execute groups of threads called a “wavefront.” GPU architectures issue wavefronts of a constant, fixed size that depends on the GPU hardware's microarchitecture. In some implementations, a wavefront is a group of 64 threads, which are issued in groups of 16 threads through a 16 thread wide single instruction, multiple data (SIMD) unit over four cycles. In many cases, all 64 threads are executing.
To maximize the throughput of the GPU, it is beneficial to execute full wavefronts, meaning that all threads of a wavefront are active. With branching instructions, all threads of a wavefront may not follow the same branch (i.e., taken or not taken). In such circumstances, different wavefronts may be “repacked” so that all of the threads of a wavefront follow the same branch direction.