Processing units, such as graphics processing units (GPUs), typically implement multiple processing elements that concurrently execute instructions for one or more workloads. The processing elements in a GPU process three-dimensional (3-D) graphics using a graphics pipeline formed of a sequence of programmable shaders and fixed function hardware blocks. For example, a 3-D model of an object that is visible in a frame is represented by a set of primitives such as triangles, other polygons, or patches which are processed in the graphics pipeline to produce values of pixels for display to a user. The fixed function hardware blocks are used to fetch vertex information, construct the primitives, perform some discard or culling of the primitives, partition a screen, distribute workloads, perform amplification, and the like. States of the workloads executing in pipelines of the GPU are stored in locations such as vector general-purpose registers (VGPRs), local data shares (LDSs), and the like. A typical GPU is not dedicated to a single workload and, at any particular time, the GPU is typically executing a variety of workloads of various complexities and priorities.