With increasing complexity of graphics workloads and expanding application domains, graphics architecture may be shifting towards more general purpose, fast, and responsive designs. Traditionally, a graphics processing unit (GPU) may be used to accelerate specific three dimensional (3D) graphics applications, wherein a different task waits for the previous context to be finished and drained from the pipeline before it can be serviced. Following the recent development of a graphics programming and multitasking driver model, recent GPUs tend to offer increasingly programmable execution units (EU) that are not only useful for graphics purposes such as computing 3D shader functions, but also media codec functions as well as other general purpose workloads offloaded from the central processing unit (CPU). While multiple tasks can use the GPU in a time-sharing manner, some applications, in particular, touch user interfaces or real-time systems, demand that high-priority tasks submitted to the GPU be performed within a certain time budget. These applications typically involve preemption, which may allow a GPU to temporarily stall current work, switch to a different context following a preemption request, and resume the stalled work after it finishes the higher-priority task it performed as a result of the preemption request.
Existing GPU platforms may provide basic support for enabling preemption. When the execution unit receives an exception raised from the preemption request, it may stop issuing further instructions from the application thread, save the current instruction pointer, and load a system routine to handle the exception. To ensure functional correctness, the system routine may be responsible for saving the current application's execution states and restoring them later when the execution is resumed. Since the preemption request may be raised when any instruction is running, conventional hardware may conservatively save all system states that may be altered during the current execution context. This approach may typically involve saving the contents of all registers contained in the general register file (GRF) and the architecture register file (ARF). The General Register File (GRF) includes general purpose read-write registers while the Architectural Register File (ARF) includes architectural registers defined for specific purposes such as address registers, accumulators, flags, etc. The majority of the overhead in supporting preemption may result from saving and restoring such execution states, which may significantly slow down the overall system performance and responsiveness. Existing GPUs already provide a large register set in order to speed up computation, and as a result, each context switch may involve saving up to multiple megabytes of data. With the growing number of execution units integrated on emerging GPU platforms, the corresponding amount of states to be saved/restored and the resulting preemption response time can greatly impact system performance.