An advantage to users of many multi- and many-core computing platforms lies in the ability to efficiently express parallelism. Task parallelism, for example, is a form of parallelism that distributes execution processes (threads) across different parallel computing nodes. For task parallelism, each computing node may execute a different set of instructions. In contrast, data parallelism focuses on distributing the data, rather than tasks, across different parallel computing nodes. Data parallelism is achieved when each computing node performs the same task on different pieces of distributed data. In some situations, different threads control the different data-parallel operations, but they execute the same software code instructions (on different data).
Several classes of programs, such as in graphics, physics and financial workloads, are often not easily task-parallelizable. Still, many such programs can greatly benefit from data-parallel approaches, wherein concurrently executing tasks (also called execution threads or worker threads) perform the same actions on a subset of the original data. For processing systems that are not designed with specialized hardware to schedule data-parallel tasks, the scheduling of data-parallel tasks is performed in software.
For a runtime that supports scheduling of data-parallel tasks in software, multiple threads need to concurrently evaluate sub-tasks (also referred to herein as “data-parallel”tasks or “work items”), where each of the multiple threads performs the same action on a sub-set of the original data. One challenge for such software approaches is to efficiently schedule the sub-tasks. That is, it is a challenge to efficiently select and schedule particular sub-tasks to run on particular threads. In other words, a key challenge that needs to be addressed is the efficient distribution of data-parallel work to each of the underlying worker threads.