1. Field of the Invention
This invention relates to computing systems, and more particularly, to automatically scheduling the execution of work units between multiple heterogeneous processor cores.
2. Description of the Relevant Art
The parallelization of tasks is used to increase the throughput of computer systems. To this end, compilers or the software programmer may extract parallelized tasks from program code to execute in parallel on the system hardware. With a single-core architecture, a single core may include deep pipelines and multiple execution contexts configured to perform multi-threading. To further increase parallel execution on the hardware, a multi-core architecture may include multiple processor cores. This type of architecture may be referred to as a homogeneous multi-core architecture and may provide higher instruction throughput than a single-core architecture. However, particular instructions for a computationally intensive task may consume a disproportionate share of a shared resource, which may in turn delay the deallocation of the shared resource. Examples of such specific tasks may include cryptography, video graphics rendering, and garbage collection.
To overcome the performance limitations of conventional general-purpose cores, a computer system may offload specific tasks to special-purpose hardware. This hardware may include a single instruction multiple data (SIMD) parallel architecture, a field-programmable gate array (FPGA), and/or other specialized types of processing cores. When an architecture includes multiple cores of different types it may be referred to as a heterogeneous multi-core architecture.
Presently, an operating system (OS) scheduler or a user-level scheduler, which may also be referred to as a “scheduler”, may schedule workloads running on a computer system with a heterogeneous multi-core architecture using a variety of schemes—such as a round-robin scheme. Additionally, an scheduler may schedule these workloads based on availability of the cores. Alternatively, a programmer may schedule the workloads in combination with the runtime system. In such a case, the programmer may utilize a software platform to perform the scheduling. For example, the OpenCL® (Open Computing Language) framework supports programming across heterogeneous computing environments and includes a low-level application programming interface (API) for heterogeneous computing. The OpenCL framework (generally referred to herein as “OpenCL”) includes a C-like language interface that may be used to define execution queues, wherein each queue is associated with an OpenCL device. An OpenCL device may be a CPU, a GPU, or other unit with at least one processor core within the heterogeneous multi-core architecture. In the OpenCL framework a function call may be referred to as an OpenCL compute kernel, or simply a “compute kernel”. A software programmer may schedule the compute kernels in the execution queues. A compute kernel may be matched with one or more records of data to produce one or more work units of computation. Each work unit may have a unique identifier (ID).
The scheduling model described above may restrict portability and performance when there is a mismatch between the scheduling schemes and system resources. The programmer may trade portability for efficiency while attempting to provide an application that spans varied system configurations.