Visually intensive computer graphics applications such as 3D (three-dimensional) computer games, flight simulators and other 3D imaging applications may involve user interaction, scene management and rendering, physics modeling, artificial intelligence and other relatively complex functions. While certain game applications can leverage the capabilities of a local GPU (graphics processing unit) by offloading graphical and non-graphical computation to the GPU in order to maintain interactive frame rates, there remains considerable room for improvement. For example, conventional approaches may use a 3D pipeline of the GPU for 2D (two-dimensional) operations such as clearing the color of render targets. Each time such a clear color operation is processed in the 3D pipeline, the 3D state may need to be saved and restored once the clear color operation processing is complete. As a result, system stalls could occur during the 3D processing of commands, particularly in non-optimized application environments in which render target clear operations occur frequently. Currently, compute kernels can be programmed only on specific GPUs that support application programming interfaces (APIs) such as DX11 (DirectX 11, Microsoft Corporation) and OpenCL (Khronos Group), whereas device drivers can use the compute kernels irrespective of the application API calls as long as the GPU supports compute kernels.