On computer systems with multiple processors, parallel processing may enable a reduction in the time required to perform a task by distributing the workload associated with that task among the processors. In programs that utilize loop-based or recursive parallelism to increase performance in this manner, there exists a conflict between the need to decrease overhead associated with work decomposition and distribution across multiple processors, versus the desire to increase potential parallelism for improved load balancing and utilization of the processors. This conflict is often resolved by selection of a grain size, which is a lower bound on the amount of work that could benefit from parallelism. The grain size is a cut-off point that limits the number of sub-divisions that will be applied to a task.
There are a number of limitations to this approach, however. It may be difficult to choose an appropriate grain size, and the optimal level of granularity may vary between tasks and even within a given task. If workload is misbalanced across a task, a small grain size that may be efficient for sub-tasks with significant work can cause excessive overhead for other sub-tasks with less work. Alternatively, larger grain sizes suitable for sub-tasks with lighter workloads can result in overloading of the sub-tasks with greater workloads, load imbalance and underutilization of processing resources.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art.