1. Field of the Invention
Embodiments of the present invention generally relate to the field of graphics hardware and, more particularly, to a system for automatically creating a list of states that need to be context switched within a graphics processing unit (GPU).
2. Description of the Related Art
Modern graphics complexity and speed require specialized logic chips called graphics processing units (GPUs) devoted to the fast rendering of three-dimensional (3-D) colored, textured images. Somewhat similar to a central processing unit (CPU), a GPU has a highly-parallel structure combined with a primitive, limited function set that enables the GPU to be much more efficient than a typical CPU at graphics rendering.
In order to perform the myriad of graphics processes required in a parallel, time-sharing fashion, contemporary GPUs must be capable of managing multiple states called contexts. A graphics context contains the current set of attributes for a particular process, which may include the matrix stack, light sources, shading control, spline basis matrices, and other hardware state information. The method of stopping a currently running process, storing the context of this process, restoring the context of an inactive process that had been previously stored, and activating the new process is called context switching. A state-of-the-art GPU must be able to quickly and efficiently context switch so that multiple operations can be executed seamlessly using only a single resource.
During a context switch, the storage elements used for storing and retrieving the context may be registers. Conventionally, the registers that are context switched need to be listed out, and a hardware state machine is created to generate this list within the silicon of the GPU. An example of such a hardware state machine 100 is illustrated in FIG. 1. In state 0 102, the instruction to save register A to the list of registers that are context switched is performed, and then the state machine moves to state 1 104, which includes a similar instruction for register B. This operation continues down the steps of the state machine for each register that is context switched until the last state n 106 is reached. This method is acceptable as long as all of the registers that need to be context switched were instructed to be saved to the list in the state machine 100.
With a long list of registers that need to be context switched, it is easy to overlook a few such registers and accidentally exclude them from instructions within the state machine 100. Furthermore, new registers are frequently added during the design phase of the GPU, and some of these may need to be context switched. In either case, adding even a single overlooked or new register to the list of registers that need to be context switched via the hardware state machine 100 requires a new instruction 108. However, once the GPU has been fabricated on a wafer, it is too late to add a new instruction 108, and any such GPUs that have made it into the field now have a context switching bug where the context of the overlooked state may bleed over into the next state (i.e. at least a portion of the original context erroneously remains in the register after a context switch). In this situation, GPU manufacturers may have to incur a costly semiconductor revision in order to fix the bug and replace defective units.
The high cost of overlooking new or existing registers that need to be context switched in a hardware state machine led GPU manufacturers to replace the hardware state machine with a microcode (μ-code) solution. Also known as firmware or a microprogram, μ-code may be thought of as software embedded in a hardware device, such as a read-only memory (ROM), a flash memory, or a memory block within a processing unit or an application specific integrated circuit (ASIC). Unlike the hardware state machine, μ-code can be compiled, stored within the GPU memory, and loaded at runtime during chip initialization. In this manner, the μ-code can be updated in the field to fix bugs such as adding registers missing from a list of registers that need to be context switched. The μ-code solution avoids having to resynthesize the silicon wafer.
No matter what type of method is used to generate a list of registers that need to be context switched, there are generally three recognized classes of state, or categories of registers, from the perspective of context switching for a given application as shown in FIG. 2a. These classes are registers that must be context switched (202), registers that can be context switched (204), and registers that must not be context switched (206). The objective is to include all of the registers that must be context switched in the list of registers to be context switched and nothing more as shown by the bracket 208 in FIG. 2a. Registers from the other two classes 204, 206 are typically not included in the list of registers to be context switched when this inclusive approach to generating such a list is taken.
As described above, however, some registers that must be context switched for another application may be overlooked and not manually added to the list of registers to be context switched. Such overlooked registers are illustrated in FIG. 2b by the inadequate “Switched” bracket 212. The “Forgotten” bracket 214 represents the missing portion of the registers that must be context switched 210, and registers within this bracket 214 may likely be responsible for context switching bugs. This register oversight may be more likely in applications with long lists of registers that must be context switched, and GPUs perform a lot of context switching compared to other types of processors.
Accordingly, what is needed is a more automatic method of generating a complete list of registers to be context switched that avoids omitting registers that must be context switched.