Many processors include the ability to perform operations on multiple operands concurrently. Typically these operations are performed in a single instruction multiple data (SIMD) arithmetic logic unit (ALU) of the processor. SIMD ALUs by definition have a single control flow for a wide number of data paths that require lock-step execution. High performance and execution efficiency result when SIMD control flow remains synchronized across all data paths, and aligned memory requests are made so that the wide memory bandwidth is well-utilized.
In some instances during execution, SIMD control flow cannot remain synchronized. This SIMD control flow divergence can occur, for example, when executing an “if/else” conditional block, such that some portions of the data paths are to execute the “if” portion and other portions the “else” portion, resulting in the notion of a branch divergence hazard. A common solution to address this hazard transforms the control flow problem into a data flow problem by sequentially executing all the control flow paths for all data paths. Here both the “if” portion of the block and the “else” portion are executed in turn by all data paths, and predicating (turning off) appropriate data paths in each paths. Nested control flow can further compound the divergence problem and can result in significant performance (compute throughput) loss. This loss of compute throughput due to diminished SIMD efficiency is called the SIMD divergence problem.