Computing platforms that include multiple processors are used to improve the performance of applications that have high computational intensity requirements and/or high data throughput requirements. A multiple-processor computing platform may include a general-purpose central processing unit (CPU) that may act as a host device and one or more computing devices that the host CPU may use to offload the performance of computationally-intensive tasks, thereby improving performance of the overall system. In some cases, the one or more computing devices may be specifically designed to process certain types of tasks more efficiently than the host CPU, which may provide further performance improvements for the overall system. For example, the one or more computing devices may be specifically designed to execute parallel algorithms more efficiently than the host CPU.
One type of computing device that may be used in a multiple-processor computing system is a graphics processing unit (GPU). Traditionally, GPUs included fixed function hardware that was specifically designed for the real-time rendering of 3-dimensional (3D) graphics to a display device, but was not typically programmable, i.e., a compiled program could not be downloaded to the GPU and executed on the GPU. More recently, however, with the development of programmable shader units, much of the architecture of the GPU has shifted to a programmable architecture that includes many parallel processing elements. The programmable architecture allows the GPU to facilitate the execution of, not only graphics operations, but also general-purpose computing tasks in a highly-parallel manner.
Using a GPU to execute general-purpose, non-graphics specific computing tasks may be referred to herein as General-Purpose computation on Graphics Processing Units (GPGPU), or alternatively as GPU computing. In some cases, GPUs may make available an application programming interfaces (API) that is not graphics specific, thereby easing the programming of the GPU for the execution of general-purpose computing tasks. GPU computing tasks may include tasks that are computationally-intensive and/or include a high degree of parallelism, e.g., matrix calculations, signal processing calculations, statistical algorithms, molecular modeling applications, finance applications, medical imaging, cryptanalysis applications, etc.
A GPU is just one type of computing device that can be used in a multiple-processor computing platform, and other types of computing devices may also be used in addition to or in lieu of a GPU. For example, other types computing devices that may be used in a multiple-processor computing platform include, e.g., an additional CPU, a digital signal processor (DSP), a Cell Broadband Engine (Cell/BE) processor or any other type of processing unit.
A multiple-processor computing platform with multiple computing devices may be either a homogenous platform or a heterogeneous platform. In a homogenous platform, all computing devices share a common instruction set architecture (ISA). In contrast, a heterogeneous platform may include two or more computing devices with different ISAs. In general, different types of computing devices may have different ISAs, and different brands of computing devices of the same type may also have different ISAs.
The performance of a multiple-processor computing platform may be further improved by utilizing multi-core computing devices and/or many-core computing devices. An example of a multi-core computing device is the GPU described above that contains a programmable shader unit having a plurality of processing cores. CPUs, however, may also be designed to include multiple processing cores. In general, any chip or die that includes multiple processing cores may be considered to be a multi-core processor. A processing core may refer to a processing unit that is capable of executing an instruction on a particular piece of data. For example, a single arithmetic logic unit (ALU) unit or vector processor within a GPU may be considered to be a processing core. Many-core processors generally refer to multi-core processors that have a relatively high number of cores, e.g., greater than ten cores, and are typically designed using different techniques than those which are used to design multi-core processors with a smaller number of cores. Multi-core processors provide performance improvement by allowing a software program to execute in parallel, e.g., concurrently, on multiple cores on a single chip.
A parallel programming model refers to a programming model that is designed to allow a program to be executed concurrently on multiple processing cores. The program may be a multi-threaded program, in which case, a single thread may operate on each processing core. In some examples, a single computing device may include all of the processing cores used to execute the program. In other examples, some of the processing cores used to execute the program may be located on different computing devices of the same type or of a different type.
A cross-platform, cross-vendor, heterogeneous computing platform, parallel programming model Application Programming Interface (API) may be used to provide a common language specification for the parallel programming of a heterogeneous, multi-core computing platform that includes different types of computing devices potentially made by different vendors which implement different ISAs. Open Computing Language (OpenCL™) is an example of a cross-platform, cross-vendor, heterogeneous computing platform, parallel programming API. Such APIs may be designed to allow for more generalized data processing on a GPU. For example, beyond exposing the expanded shader subsystem capabilities via a compute language, these APIs may generalize the data flow and control paths into the GPU in a non-graphics specific manner. Presently, however, the instruction sets provided by such APIs are based on the hardware architecture of a GPU, and hence, limited to functionality that is compatible with existing GPU architectures.