In a computing environment with an Operating System (OS), a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), a graphics driver, and a graphics memory, the OS schedules workloads from multiple clients to the GPU. These workloads are referred to herein as contexts. Some contexts are so massive that they demand a non-trivial amount of execution time on the GPU. Some contexts are very time sensitive and should be executed almost immediately. Striking a balance across multiple contexts and the management of the GPU helps to maintain a pleasing end user experience.
The OS has techniques for graphics workflow management. Based on these workflow management techniques, the OS schedules the contexts, typically based on user inputs, time slicing, priority etc. The workload for each context is submitted as a DMA (Direct Memory Access) buffer to a software graphics driver. The graphics driver creates GPU-specific command buffers for execution called batch buffers. The GPU-specific batch buffers are queued to the GPU hardware for execution. A typical batch buffer include many commands, which constitute the workload. The batch buffer commands may keep a GPGPU (General Purpose Graphics Processing Unit) occupied for a long time, for example for several milliseconds.