As is known, the art and science of three-dimensional (“3-D”) computer graphics concerns the generation, or rendering, of two-dimensional (“2-D”) images of 3-D objects for display or presentation onto a display device or monitor, such as a cathode ray tube (CRT) or a liquid crystal display (LCD). The object may be a composition of simple geometry primitives such as a point, a line segment, a triangle, or a polygon. More complex objects can be rendered onto a display device by representing the objects with a series of connected planar polygons, such as, for example, by representing the objects as a series of connected planar triangles. All geometry primitives may eventually be described in terms of one vertex or a set of vertices, for example, coordinates (x, y, z, w) that define a point, for example, the endpoint of a line segment, or a corner of a polygon.
To generate a data set for display as a 2-D projection representative of a 3-D primitive onto a computer monitor or other display device, the vertices of the primitive are processed through a series of operations, or processing stages in a graphics-rendering pipeline. A generic pipeline is merely a series of cascading processing units, or stages, wherein the output from a prior stage serves as the input for a subsequent stage. In the context of a graphics processing unit, these stages include, for example, per-vertex operations, primitive assembly operations, pixel operations, texture assembly operations, rasterization operations, and fragment operations.
In a typical graphics display system, an image database (e.g., a command list) may store a description of the objects in the scene. The objects are described with a number of small polygons, which cover the surface of the object in the same manner that a number of small tiles can cover a wall or other surface. Each polygon is described as a list of vertex coordinates (x, y, z in “model” coordinates) and some specification of material surface properties (e.g., color, texture, shininess, etc.), as well as possibly the normal vectors to the surface at each vertex. For 3-D objects with complex curved surfaces, the polygons in general are typically triangles or quadrilaterals, and the latter can always be decomposed into pairs of triangles.
A transformation engine transforms the object coordinates in response to the angle of viewing selected by a user from user input. In addition, the user may specify the field of view, the size of the image to be produced, and the back end of the viewing volume to include or eliminate background as desired.
Once this viewing area has been selected, clipping logic eliminates the polygons (e.g., triangles) which are outside the viewing area and “clips” the polygons that are partly inside and partly outside the viewing area. These clipped polygons correspond to the portion of the polygon inside the viewing area with new edge(s) corresponding to the edge(s) of the viewing area. The polygon vertices are then transmitted to the next stage in coordinates corresponding to the viewing screen (in x, y coordinates) with an associated depth for each vertex (the z-coordinate). In a typical system, the lighting model is next applied, taking into account the light sources. In some implementations, a lighting model may be applied prior to clipping. The polygons with their color values are then transmitted to a rasterizer.
For each polygon, the rasterizer determines which pixels are positioned in the polygon and attempts to write the associated depth (z-value) into a depth buffer. The rasterizer produces blank pixels and compares the depth (z-value) for the pixels of the polygon being processed with the depth value of a pixel that is already written to the depth buffer. If the depth value of the new polygon pixel is smaller, indicating that it is in front of the polygon already written into the frame buffer, then its value replaces the value in the depth buffer because the new polygon obscures the polygon previously processed and written into the depth buffer. This process is repeated until all of the polygons have been rasterized and depth tested.
Subsequently, the associated texture is typically fetched and applied to visible pixels of polygons. The texture, or rather texture values, may be accompanied by color values interpolated from vertex colors. A pixel processing stage subsequently merges all these values into a final pixel value that is written to the frame buffer. At that point, a video controller displays the contents of the frame buffer on a display one scan line at a time in raster order.
With this general background provided, reference is now made to FIG. 1, which shows a functional flow diagram of certain components within a fixed function graphics pipeline in a graphics processor system. It should be appreciated that components within graphics pipelines may vary among different systems, and may be illustrated in a variety of ways. As is known, a host computer 10 (or a graphics application programming interface (API) running on a host computer) may generate a command list through a command stream processor 12. The command list comprises a series of graphics commands and data for rendering an “environment” on a graphics display. Components within the graphics pipeline may operate on the data and commands within the command list to render a screen in a graphics display.
In this regard, a parser 14 may receive commands from the command stream processor 12 and “parse” through the data to interpret commands and pass data defining graphics primitives along (or into) the graphics pipeline. Note that the command stream processor 12 may comprise parser functionality is some systems. In this regard, graphics primitives may be defined by location data (e.g., x, y, z, and w coordinates) as well as lighting and texture information. All of this information, for each primitive, may be retrieved by the parser 14 from the command stream processor 12, and passed to a vertex shader 16. As is known, the vertex shader 16 may perform various transformations on the graphics data received from the command list. In this regard, the data may be transformed from world coordinates into model view coordinates, into projection coordinates, and ultimately into screen coordinates. The functional processing performed by the vertex shader 16 is known and need not be described further herein. Thereafter, the graphics data may be passed onto rasterizer 18, which operates as summarized above.
Thereafter, a z-test 20 (depth test) is performed on each pixel within the primitive. As is known, comparing a current z-value (i.e., a z-value for a given pixel of the current primitive) with a stored z-value for the corresponding pixel location comprises performing a z-test. The stored z-value provides the depth value for a previously rendered primitive for a given pixel location. If the current z-value indicates a depth that is closer to the viewer's eye than the stored z-value, then the current z-value replaces the stored z-value and the current graphic information (i.e., color) replaces the color information in the corresponding frame buffer pixel location (as determined by the pixel shader 22). If the current z-value is not closer to the current viewpoint than the stored z-value, then neither the frame buffer nor z-buffer (depth buffer) contents needs to be replaced, as a previously rendered pixel is hence deemed to be in front of the current pixel. For pixels within primitives that are rendered and determined to be closer to the viewpoint than previously-stored pixels, information relating to the primitive is passed on to the pixel shader 22, which determines color information for each of the pixels within the primitive that are determined to be closer to the current viewpoint. The pixel shader 22 also passes texture coordinates and other information to a first-in, first-out buffer (herein, FIFO) 26, which provides such data to a texture unit 28 for texture processing.
The complexity and magnitude of graphics data in a pipeline suggests that pipeline inefficiencies, delays, and bottlenecks can significantly compromise the performance of the pipeline. For instance, one potential bottleneck involves what is commonly referred to as dependent reads, as provided with the loop comprising the pixel shader 22 and a texture pipeline comprising the FIFO 26 and texture unit 28. Typically, when texture sampling is to be performed within a pixel shader 22, a multitude of information needs to be swapped from the pixel shader 22 to the FIFO 26, which is then passed to the texture unit 28 and then ultimately returned to the pixel shader 22. That is, the FIFO 26 acts as a latency cache or buffer to store all such information while the pixel shader 22 is switched to another task or thread to hide the latency associated with the dependent read. Upon receipt of texture data, such information and the texture data is returned to the pixel shader 22 to resume processing of the pixels.
Such information that is passed to the FIFO 26 may include a return address for the sample request (e.g., return address upon completion of the dependent read), texture coordinates, pixel masks, task identifier, and the contents of a plurality of registers corresponding to a certain thread and processing related data.
A conventional system and method for handling this multitude of information pertaining to a dependent read request in the pixel shader 22 is to break pixels into batches by software (e.g., driver software) or rasterizer hardware. The received pixels (e.g., 2×2 tiles or in any other form) are received at the input to the pixel shader 22. A first portion of the pixel shader 22 (i.e., before the first dependent read request) is executed on these received pixels, then the above-mentioned information is sent to the texture pipeline comprising the FIFO 26 and the texture unit 28. The pixel shader 22 may continue processing pixels received at its input while waiting for the texture pipeline to complete the first batch of dependent read requests on the previous pixels. However, the pixel shader 22 stops receiving new pixels at a certain threshold point or capacity, and hence ends or completes a batch. The batch flows in the closed loop of the pixel shader 22, FIFO 26, and texture unit 28 for a number of times (referred to as dependent read passes) until all pixels in the batch have completed all dependent reads and the thread is completed (pixel shader processing has completed for a given batch).
In conventional systems and methods, the batch size is calculated by the software (driver) with careful consideration to prevent dead-lock. For instance, consider a FIFO 26 having a total storage of 3000 units (e.g., units associated with a number (e.g., 3000) herein refer to bits or bytes), with each pixel (e.g., 2×2 tiles or in any other granularity) requiring 8 units for storing the contents of temporary registers and 2 units for dependent read texture coordinates in the first pass, 15 units for storing the contents of registers, and 5 units for coordinates. Accordingly, the batch size is set to 3000/max (first pass storage needed, second pass storage needed, . . . etc.)=3000/max (8+2, 15+5, . . . etc.)=50 pixels, for example. In this example, if more than 50 pixels are permitted to enter the loop, the system will lock up when these pixels start producing more data than that which can be held in the second pass. Further challenges to proper FIFO allocation may be presented due to variations in the size of pixel data, and/or parallel pixel shaders 22 may have different objects being processed with a different number of pixel data types involved, and hence for every request in the case of a common latency FIFO, a different number of entries may need to be reserved.
Conventional systems and methods typically only support a fixed number of dependent read passes, such as by configuring a pass control register to limit the passes to a defined amount (e.g., up to 4 passes). However, graphics processing technology has experienced an increased sophistication, with APIs and/or applications having the ability to support unlimited dependent read passes and dynamic control flow, like dynamic branch, loop and subroutines (data dependent code branch). With this increased graphics software sophistication is a need for more sophisticated hardware control logic to support these and other advanced dependent read features.