Graphics Processing Units (GPUs) are becoming increasingly more complex. In particular, compared to traditional data parallel applications suited to a GPU, newer general-purpose GPU (GPGPU) applications have less regular memory access patterns and execute more branch instructions that control the flow of computations based on specific conditions.
GPGPUs sequentially execute threads that diverge at a branch instruction. Accordingly, while one group of threads of an instruction is executed on a branch, another group on the divergent path awaits execution until the first group converges with it at a safe point. Similarly, when accessing memory, execution lanes in the GPU may have divergent execution paths based on when data is available from memory.
In these lane divergence situations, all execution lanes cannot be fully utilized simultaneously. However, those unused execution lanes continue to be active and consume power, such as clocking and leakage power, during the time of execution for threads that are on another lane.