Heterogeneous systems make use of central processing units (CPUs) and accelerators, such as graphical processing units (GPUs), field programmable gate arrays (FPGAs). Heterogeneous systems may also make use of co-processors, such as the Xeon™ Phi manufactured by Intel, Corp. (Santa Clara, Calif.). Such heterogeneous systems are increasingly popular in the field of high-performance computing. Such accelerators may provide significant additional computing power and/or increased functionality to CPUs. However, developing software capable of leveraging the full power of such accelerators can be challenging for programmers. The challenges facing programmers include creating highly efficient code as well as specifying how tasks are partitioned between the CPU and one or more accelerators to optimize the use of system resources and maximize system performance.
Further complicating matters, the optimal partitioning of tasks among the central processing unit and one or more accelerators may depend on the properties of the task, and often the properties of the input data used by the task. In addition, data transmission between the central processing unit and the accelerators is usually significant in both quantity and time, thus consideration must also be given to reducing system latency by establishing workload partitions that maintain common data in a common device.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications and variations thereof will be apparent to those skilled in the art.