The term “Single Instruction Multiple Thread” refers to the simultaneous execution of the same processing code in many threads with different input data in each thread. SIMT techniques have been used for array processors, which are specifically designed to perform a similar operation repetitively on many inputs. For example, modern Graphics Processing Unit (GPU) array processors include hundreds or thousands of Arithmetic Logic Units (ALUs) that are each capable of computing a function using an input vector. By feeding different input vectors to different ALUs, a given function can be computed many times in one processing cycle over many inputs. As GPUs continue to grow more powerful, computer scientists have come to use GPUs, which typically handle computation only for computer graphics, to perform computation in applications traditionally handled by a CPU. This technique is known as “general-purpose computing on graphics processing units” (GPGPUs). However, during a given processing cycle, many available ALUs may not be utilized.