Advanced Graphics Processing Units (“GPUs”) sometimes implement techniques for context switching. In general, context switching refers to switching execution among multiple contexts such that those contexts can share a common processing resource. Multiple contexts can be related to distinct operation modes of the same application program or multiple application programs.
In order to accelerate graphics processing, it is desirable to reduce a response time for context switching. As can be appreciated, this response time represents a performance penalty for context switching and is typically dependent upon a pair of factors, namely an amount of time to complete any pending work and an amount of execution state information to be stored and restored. Typically, a larger amount of time to complete any pending work translates into a longer response time, thus resulting in a larger performance penalty. Similarly, a larger amount of execution state information to be stored and restored typically translates into a longer response time and a larger performance penalty. Unfortunately, current techniques for context switching can be deficient from the standpoint of one or both of these factors, particularly with respect to clipping graphics primitives.
One current technique for context switching is a “wait for idle” technique. In accordance with this technique, context switching is typically placed on hold so as to complete any pending work in connection with clipping a graphics primitive. While an amount of execution state information to be stored and restored is thus reduced, completing the pending work often takes an undesirable amount of time. Indeed, in some instances, completing the pending work can take hundreds of clock cycles, particularly when clipping the graphics primitive with respect to multiple clipping planes. Another current technique for context switching is a “halt style” technique. In accordance with this technique, any pending work in connection with clipping a graphics primitive is halted prior to its completion to allow context switching. While an amount of time to complete the pending work is thus reduced, an amount of execution state information to be stored and restored is often undesirably large, particularly given the extensive amount of information that is typically maintained in hardware while clipping the graphics primitive. Indeed, in some instances, storing and restoring the execution state information can take numerous clock cycles.
It is against this background that a need arose to develop the apparatus, system, and method described herein.