In multiprocessing, processors or logical processors may employ multithreading logic to execute a plurality of threads of executable instructions concurrently or in parallel. One of the most common forms of parallel programming is known as Single Program Multiple Data (SPMD). SPMD is a technique employed to achieve parallelism, in which tasks are split up and run simultaneously on multiple processors (or logical processors) with different inputs in order to obtain results faster. Multiple autonomous processors (or logical processors) may simultaneously execute the same program at independent execution points.
SPMD differs from Single Instruction Multiple Data (SIMD) in that, rather than the instruction-by-instruction lockstep that SIMD imposes on different data, SPMD can be used to call multiple instances of a function, or to execute multiple iterations of a loop in parallel on multiple processors (or logical processors). These two forms of parallel programming are not mutually exclusive. For example, the SPMD program may also employ SIMD instructions.
In fact, current computers may allow for exploiting many parallel modes at the same time for maximum combined effect. A distributed memory program may run on a collection of nodes. Each node may be a shared memory computer and execute in parallel on multiple processors (or logical processors). Within each processor, SIMD vector instructions may use superscalar instruction execution (usually handled transparently by the CPU), pipelining and multiple parallel functional units, for maximum single CPU speed.
As these various forms of parallelism are employed together, the processing time required to execute an individual SPMD task may be reduced, while the processing time required to perform real-time synchronization, e.g. splitting up the tasks, allocating the tasks to multiple processors (or logical processors) and communicating through shared memory, becomes a more significant overhead challenge, which may limit the performance gains otherwise expected from simultaneously exploiting so many forms of parallelism.
To date, solutions that address these challenges, potential performance limiting issues, and real-time complexities have not been adequately explored.