In data parallel computing, the parallelism comes from distributing large sets of data across multiple simultaneous separate parallel computing operators or nodes. In contrast, task parallel computing involves distributing the execution of multiple threads across multiple simultaneous separate parallel computing operators or nodes. Typically, hardware is designed specifically to perform data parallel operations. Therefore, a data parallel program is a program written specifically for data parallel hardware. Traditionally, data parallel programming requires highly sophisticated programmers who understand the non-intuitive nature of data parallel concepts and are intimately familiar with the specific data parallel hardware being programmed.
Outside the realm of supercomputing, a common use of data parallel programming is graphics processing, because such processing is regular, data intensive and specialized graphics hardware is available. Particularly, a Graphics Processing Unit (GPU) is a specialized many-core processor designed to offload complex graphics rendering from the main central processing unit (CPU) of a computer. A many-core processor is one in which the number of cores is large enough that traditional multi-processor techniques are no longer efficient—this threshold is somewhere in the range of several tens of cores. While many-core hardware is not necessarily the same as data parallel hardware, data parallel hardware can usually be considered to be many-core hardware.
Other existing data parallel hardware includes Single instruction, multiple data (SIMD), Streaming SIMD Extensions (SSE) units in x86/x64 processors available from contemporary major processor manufactures.
Typical computers have historically been based upon a traditional single-core general-purpose CPU that was not specifically designed or capable of data parallelism. Because of that, traditional software and applications for traditional CPUs do not use data parallel programming techniques. However, the traditional single-core general-purpose CPUs are being replaced by many-core general-purpose CPUs.
While a many-core CPU is capable of data parallel functionality, little has been done to take advantage of such functionality. Since traditional single-core CPUs are not data parallel capable, most programmers are not familiar with data parallel techniques. Even if a programmer was interested, there remains the great hurdle for the programmer to fully understand the data parallel concepts and to learn enough to be sufficiently familiar with the many-core hardware to implement those concepts.
If a programmer clears those hurdles, they must recreate such programming for each particular many-core hardware arrangement where they wish for their program to run. That is, because conventional data parallel programming is hardware specific, the particular solution that works for one data parallel hardware will not necessarily work for another. Since the programmer programs their data parallel solutions for the specific hardware, the programmer faces a compatibility issue with differing hardware.
Presently, no widely adopted, effective, and general-purpose solution exists that enables a typical programmer to perform data parallel programming. A typical programmer is one who does not fully understand the data parallel concepts and is not intimately familiar with each incompatible data-parallel hardware scenario. Furthermore, no effective present solution exists that allows a programmer (typical or otherwise) to be able to focus on the high-level logic of the application being programmed rather than focus on the specific implementation details of the target hardware level.