In the field of computer graphics, commands from an application are used to render primitives of one or more graphical objects. As used herein, to render primitives of one or more graphical objects refers to the process of generating pixel data or pixel sample data. The rendered data is generally buffered and displayed on a suitable display device. Modern systems often incorporate, but do not require, a central processing unit (“CPU”) separate and distinct from a graphics processing unit (“GPU”). However, it is recognized that other systems may use different cores of one or more processors to dynamically allocate or otherwise divide the processing load among the cores. At any given time, a particular core may act as a traditional CPU or traditional GPU. Accordingly, references to a CPU and a GPU herein are not intended to be limiting to any specific structure or function but are merely used to differentiate among traditional types of processing resources and/or processing. Generally, the CPU is responsible for sending commands to the GPU for rendering thereon. The GPU renders graphics objects and displays them on a display device more quickly than if a CPU were used to draw the same to the display. Commands from the CPU generally include a variety of state commands and their associated draw commands. For example, a CPU may issue a single draw command to render and draw one or more primitives associated with a graphics object. The state commands may “set” a particular state or condition associated with an associated draw command.
As is known, primitives may include one or more vertices (e.g., three vertices). A draw command minimally consists of the location of the vertices (e.g., in world space coordinates) associated with the one or more primitives of the graphics object to be rendered. Associated state commands may indicate a variety of additional information relevant for the particular draw or rendering. State commands may include attribute values associated with primitive vertices and other constant data associated with the one or more primitives in the graphics object. For example, when a particular draw command requires use of a stencil test, one having ordinary skill in art will recognize that a single stencil reference value for all primitives in the draw command (i.e., for all pixels or pixel samples in the graphics object) may be passed as a state command or as part of a state command. The stencil reference value is generally an 8-bit value. Accordingly, using information contained within draw and state commands, the GPU performs a variety of transformations on the data and may generate display data corresponding thereto.
In prior art systems, the CPU includes one or more host processors (“host processor”) that execute instructions associated with an application and a driver. The instructions may be stored in memory, coupled to the host processor, as modules. With respect to the host processor, memory may be, for example, on chip, off chip, dedicated, distributed, integrated and/or shared, as is known in the art. The state and draw commands are generally provided by the driver executing on the host processor in response to received render commands from the application which is also executing on the host processor. As the host processor executes instructions associated with the application, a plurality of application render commands to display one or more graphical objects are generally generated. The driver then translates or compiles the application render commands into commands that are understandable by the GPU. The translated commands (i.e., the state and draw commands) are then communicated to the GPU for processing thereon (e.g., over a suitable bus).
As is known, it is common to issue draw and/or state commands that call for the performance of a stencil test. The stencil test is performed on a pixel or pixel sample basis within the graphics object and may or may not require the comparison of the provided stencil reference value with the respective previously stored stencil value for the respective pixel or pixel sample. Based on the Z and/or stencil test, as is known in the art, the GPU writes or otherwise updates the memory storing the stencil values using a corresponding stencil operation. Exemplary stencil tests include: “greater than?”, “less than?”, “greater than or equal to?”, “less than or equal to?”, “equal to?”, “not equal to?”, “always” and “never”. One having ordinary skill in the art will recognize that the “always” and “never” tests do not specifically require a comparison of a current stencil reference value and a stored stencil value as the test will either always pass or never pass. Common stencil operations may include, for example: “keep”, “increment”, “decrement”, “increment and clamp”, “decrement and clamp”, “replace”, “zero” and “invert”. Other stencil tests and operations may be employed. Accordingly, the state commands might either reference which stencil test and/or stencil operation to use or may physically pass the instructions necessary to carry out the stencil test and/or stencil operation. Alternatively, the GPU may be programmed or otherwise configured to run a given stencil test and/or stencil operation.
Generally, the GPU interacts with memory (e.g., on chip, off chip, dedicated, distributed, integrated and/or shared) to store data necessary for the rendering and display of the final pixels and pixel samples on the screen. As appreciated, the memory may often be of limited size and may be shared with other processing units (e.g., the CPU). The GPU, when performing a stencil operation, may write the resulting stencil value to a stencil buffer (i.e., memory or a portion of memory) for quick access. In some systems, the stencil buffer is termed a Z/stencil buffer because it may also store Z data in addition to stencil values and related stencil data (e.g., stencil metadata). Because of the attributes of the stencil buffer (i.e., its location and accessibility), it may be desirable, in certain situations, to write other types of data thereto for use in the rendering process. However, as appreciated by those of ordinary skill in the art, the stencil values may not merely be written over if needed in a subsequent stencil test. Accordingly, while other data may be written to the stencil buffer, the stencil values must be moved or copied to another memory location. At a later time, the moved stencil values must be moved or copied back to the stencil buffer for a subsequent stencil test or other known operation.
Similarly, stencil values might be needed for use in another capacity by other logic blocks in a computer graphics system. For example, an application (e.g., executing on a host processor) may desire to view the stencil data via a command known in the art as a direct CPU access. In such cases, the stencil values might need to be moved to a more convenient location for this capacity (e.g., memory affiliated with the host processor) and then moved back to the stencil buffer when stencil testing needs to continue.
In the cases described above, the stencil values are moved from the stencil buffer using known memory access commands that directly access the stencil buffer and do not require use of the stencil block or stencil logic (i.e., the portion of the GPU that actually performs the stencil test and writes stencil values to the stencil buffer or otherwise updates stencil values previously stored in the stencil buffer). The stencil block/logic is described in further detail below.
Moving stencil values from the stencil buffer and writing stencil values back to the stencil buffer may prove difficult in graphical systems employing proprietary formatting schemes and/or proprietary compression schemes. For example, a GPUs may have a stencil block/logic that employs specific proprietary tiling schemes and/or compression schemes when it writes stencil values to the stencil buffer. As a result, only the stencil block/logic understands how to read the stencil values stored in the stencil buffer. Similarly, only the stencil block/logic understands how to write stencil values to the stencil buffer. Accordingly, after a stencil value move, a need exists in graphics systems employing proprietary formatting and/or compression schemes, for providing to the stencil block/logic the previously moved stencil values for reformatting and/or compression and storage in the stencil buffer.
Additionally, and as provided above, prior art draw commands were limited to passing a single stencil reference value for each draw command (i.e., for all pixels or pixel samples for all primitives in the graphics object associated with the draw command. The provision of a single stencil reference value per draw command, however, does not provide an application designer maximum flexibility in programming. Accordingly, a need exists to allow the provision of multiple stencil reference values for a draw command where each pixel or pixel sample has its own programmable stencil reference value. Thereafter, stencil tests and operations may be performed on the programmable stencil reference values. Such a solution would give application designers additional flexibility in rendering graphic objects. For example, an application designer could then use the programmable stencil reference values as a way of tagging certain pixels or pixel samples in a graphics objects such that the tagged pixels or pixel samples are the only portions of a particular graphics object that are processed, rendered or displayed or such that the tagged pixels or pixel samples are the only portions of the graphics object that are not processed, rendered or displayed.
A further need exists for allowing application designers the ability to write application-level machine-readable computer code (e.g., source code in OpenGL, D3D, etc.) for applications wherein the compiled computer code directs the provision of: (1) previously moved stencil values (e.g., as a result of limited memory resources) or (2) generated programmable stencil reference values (e.g., as determined by the application designer) to the stencil block/logic. That is, a need exists for a new command available to application designers that allow for the above functionality. An additional need exists for a driver that understands the new application-level commands and is capable of translating the commands into a corresponding command understandable by a GPU.