Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to a method and apparatus for a highly efficient graphics processing unit (GPU) execution model.
Description of the Related Art
General-purpose computing on graphics processing units (GPGPU) involves the use of a graphics processing unit (GPU), which typically handles computer graphics computations, for performing computations in applications traditionally handled by the central processing unit (CPU). Any GPU providing a functionally complete set of operations performed on arbitrary bits can compute any computable value. Because GPUs typically include numerous execution units and because computer systems may include multiple GPU chips, current GPU platforms make an ideal platform for executing certain types of parallel program code.
OpenCL is an industry standard application programming interface (API) for GPGPU computing. The OpenCL 2.0 version of the specification has introduced a new concept to execute work on the GPU referred to as “nested parallelism” which is directed at a particular type of parallel data problem where the scale and magnitude of the work is known only during the execution of the workload. Graph Traversal is a good example of this type of workload, where the amount of processing is known only after processing the nodes of the graph. This is a new GPGPU processing paradigm.
In current configurations, the host processor controls exactly what, when, and how instructions and data are processed by the GPU. Thus, GPUs are typically slave devices to the host processor, which acts as a master. With the brute force method of the host controlling the execution of the GPU for graph traversal, for example, the GPU processes one or a few node(s) at a time and the host will decide what nodes to process next. This means the GPU and the host have to communicate the status/results back and forth, resulting in a power and performance impact.