Task parallelism is a form of parallelization in which computational codes are parallelized across multiple processors. A computational task, serving as the basic schedulable unit in a parallel computing environment, embodies a computational procedure (hereafter referred to as “kernels”) with or without certain inputs and outputs. A task-based parallel programming runtime allows programmers to express algorithms in the form of tasks, and uses a scheduler to distribute tasks across multiple processors and achieve maintenance functionalities, such as synchronization and load balancing. As task-based runtime systems mature and offer more features, task abstractions become increasingly complicated, imposing significant overhead to task creation, management, and destruction. For example, task-based runtime systems incur overhead in setting up a task in determining whether the task belongs to a heterogeneous device execution path, task referencing and un-referencing to track the task's lifecycle, and requesting exclusive ownership from the scheduler.
Because of the overhead of creating, dispatching and managing tasks is comparable to the actual computation, a traditional task-based runtime system adds significant overhead to lightweight kernels. Both performance and energy efficiency are impaired due to the unavoidable overhead associated with task management. A full-fledged task-based runtime system is suitable for heavyweight kernels with complex dependencies and synchronization requirements as parallelization occurs at lower frequencies due to these restrictions.