Single Program Multiple Data (SPMD) refers to a parallel computing mechanism in which programs or tasks are split across a plurality of processors which are configured to operate on each different data. SPMD applies a scalar and sequential program (“SPMD kernel” or “SPMD code”) simultaneously to multiple data streams. Examples of SPMD include, but are not limited to: OpenMP® (Open Multi-Processing), Fork-join, Pthread (POSIX (Portable Operating System Interface) Thread), Map-reduce, CUDA® (Compute Unified Device Architecture), OpenCL® (Open Computing Language), etc. An SPMD programming model includes running a plurality of software threads or software processes, each of which maintains its own program counter (PC) and states stored in its own register. Any control-flow operation in SPMD code (i.e., running the SPMD kernel as multiple instruction streams), when applied to multiple data streams, may produce multiple local PCs, which is called control-flow divergence. Control-flow divergence is a runtime behavior in a SPMD code, where PCs of multiple instruction streams of the SPMD code differ among themselves.
Single Instruction Multiple Data (SIMD) refers to a parallel computing mechanism in which a plurality of processors are configured to perform same operations on different data. Examples of SIMD machine includes, but is not limited to: AltiVec machine (i.e., a machine running AltiVec® (i.e., an instruction set designed for a SIMD machine)), VMX server (i.e., a server running Vintela Management Extensions (VMX)), SSE machine (i.e., machine running Streaming SIMD Extensions (SSE), which is an instruction set designed for SIMD machine), AVX machine (machine running Advanced Vector Extensions (AVX) instruction set), etc. A SIMD machine includes only one single PC (program counter). Each instruction stream (i.e., each processor) in SIMD machine is called a lane. Running of instructions on lanes on a SIMD machine is controlled by a predication mask. The predication mask indicates for each lane whether the lane is active for the PC being run or not. When a lane is active, the current PC is run on the lane, otherwise it is not. The predication mask of a SIMD machine can be updated as the result of other machines instructions such as compare, register move, or branch.