Software programs have been written to run sequentially since the beginning days of software development. Steadily over time computers have become much more powerful, with more processing power and memory to handle advanced operations. This trend has recently shifted away from ever-increasing single-processor clock rates and towards an increase in the number of processors available in a single computer, i.e. away from sequential execution and toward parallel execution. Software developers want to take advantage of improvements in computer processing power, enabling their software programs to run faster as new hardware is adopted. With parallel hardware, however, this requires a different approach: developers must arrange for one or more tasks of a particular software program to be executed in parallel (sometimes called “concurrently”), so that the same logical operation can utilize many processors at one time, and deliver better performance as more processors are added to the computers on which such software runs.
Data parallelism, where operations are expressed as aggregate computations over large collections of data, encompasses a certain class of operations using which a sequential program may be parallelized. A data parallel operation partitions its input data collection into logically disjoint subcollections so that independent tasks of execution may process the separate subcollections in isolation, all as part of one larger logical operation. Partitioning data can be a costly endeavor, because it implies inter-task communication, and similarly merging data back into a single stream can also be costly for the same reason.