Many computing devices include both a central processing unit (CPU) for general purpose processing and a graphics processing unit (GPU) that is devoted primarily to graphics purposes. The central processing unit does general tasks like running applications.
A heterogeneous computing environment includes different types of processing or computing devices within the same system or network. Thus, a typical platform with both a CPU and a GPU is an example of a heterogeneous computing environment. Another example of a heterogeneous computing environment would be a CPU connected via a network connection to a virtual cluster of computers referred to as a compute cloud.
Cloud computing allows a user to utilize applications or services running on a remotely located computer rather than on the user's local computer. For example, data may be processed in the cloud by forwarding the data from a client computer to one or more server computers, where the data is processed before returning the processed data back to the client computer. This way, the client computer offloads processing tasks to computers in the cloud. Cloud computing can provide significant processing resources and can greatly increase the speed of processing tasks, especially when those tasks are intelligently routed to the cloud(s).
Computers and other such data processing devices have at least one control processor that is generally a CPU. Such computers and processing devices may also use GPUs for specialized types of processing. For example, GPUs are designed to be particularly suited for graphics processing operations. GPUs generally comprise multiple processing elements that are ideally suited for executing the same instruction in parallel on different data streams, such as in data-parallel processing. A GPU can comprise, for example, a graphics processor unit, a graphics processor, a graphics processing core, a graphics processing device, or the like. In general, a CPU functions as the host or controlling processor and transfers specialized functions such as graphics processing to other processors such as GPUs.
With the availability of multi-core CPUs where each CPU has multiple processing cores, substantial processing capabilities that can also be used for specialized functions are available in CPUs. One or more of the computation cores of multi-core CPUs and GPUs can be part of the same or on different dies. Recently, programming systems have been introduced for General Purpose GPU (GPGPU) style computing to execute non-graphics applications on GPUs. The GPGPU style of computing advocates using the CPU to primarily execute control code and to offload performance critical data-parallel code to the GPU. The GPU is primarily used as an accelerator. However, some GPGPU programming systems allow the use of both CPU cores and GPU cores as accelerator targets.
Several frameworks have been developed for heterogeneous computing platforms that have CPUs and GPUs. These frameworks include BrookGPU by Stanford University, OpenCL™ by an industry consortium named Khronos Group, and CUDA™ by NVIDIA.
The OpenCL™ framework offers a C-like development environment in which users can create applications for GPU. OpenCL™ enables the user, for example, to specify instructions for offloading some computations, such as data-parallel computations, to a GPU. OpenCL™ also provides a compiler and a runtime environment in which code can be compiled and executed within a heterogeneous computing system.
NVIDIA's CUDA™ (Compute Unified Device Architecture) technology provides a C language environment that enables programmers and developers to write software applications to solve complex computational problems such as video and audio encoding, modeling for oil and gas exploration, and medical imaging. The applications are configured for parallel execution by a multi-core GPU and typically rely on specific features of the multi-core GPU.
Frameworks such as CUDA™, at present, require the programmer to determine what parts of the application(s) are executed on the CPUs and the GPUs of the heterogeneous system. Determining this split, however, is not trivial as GPUs and CPUs spanning a broad spectrum of performance characteristics are available on the market and can be mixed and matched in a given system. In addition, the available resources on the system at runtime may vary depending on other applications executing on the same system. Therefore, application programmers are faced with implementing elaborate, complex, dynamic schemes for allocating multiple kernels to processors within their applications or settling for sub-optimal performance.
There exists a significant need in the prior art for a method that intelligently routes computations in heterogeneous computing environments at runtime between CPUs and GPUs to optimize performance and maximize efficient use of available hardware. The present invention meets this need by providing a method for evaluating a heterogeneous computing environment's capability and creating a model that can be used to make decisions for where to route particular computations.