Recent advances in computer performance have enabled graphic systems to provide more realistic graphical images using personal computers, home video game computers, handheld devices, and the like. In such graphic systems, a number of procedures are executed to “render” or draw graphic primitives to the screen of the system. A “graphic primitive” is a basic component of a graphic picture, such as a point, line, polygon, or the like. Rendered images are formed with combinations of these graphic primitives. Many procedures may be utilized to perform 3-D graphics rendering.
Specialized graphics processing units (e.g., GPUs, etc.) have been developed to optimize the computations required in executing the graphics rendering procedures. The GPUs are configured for high-speed operation and typically incorporate one or more rendering pipelines. Each pipeline includes a number of hardware-based functional units that are optimized for high-speed execution of graphics instructions/data, where the instructions/data are fed into the front end of the pipeline and the computed results emerge at the back end of the pipeline. The hardware-based functional units, cache memories, firmware, and the like, of the GPU are optimized to operate on the low-level graphics primitives and produce real-time rendered 3-D images.
The real-time rendered 3-D images are generated using rasterization technology. Rasterization technology is widely used in computer graphics systems, and generally refers to the mechanism by which the grid of multiple pixels comprising an image are influenced by the graphics primitives. For each primitive, a typical rasterization system steps from pixel to pixel and determines whether or not to “render” (write a given pixel into a frame buffer or pixel map) as per the contribution of the primitive. This, in turn, determines how to write the data to the display buffer representing each pixel.
Various traversal algorithms and various rasterization methods have been developed for computing all of the pixels covered by the primitive(s) comprising a given 3-D scene. For example, some solutions involve generating the pixels in a unidirectional manner. Such traditional unidirectional solutions involve generating the pixels row-by-row in a constant direction (e.g. left to right). The coverage for each pixel is evaluated to determine if the pixel is inside the primitive being rasterized. This requires that the sequence shift across the primitive to a starting location on a first side of the primitive upon finishing at a location on an opposite side of the primitive.
Other traditional methods involve stepping pixels in a local region following a space filling curve such as a Hilbert curve. The coverage for each pixel is evaluated to determine if the pixel is inside the primitive being rasterized. This technique does not have the large shifts (which can cause inefficiency in the system) of the unidirectional solutions, but is typically more complicated to design than the unidirectional solution.
Once the primitives are rasterized into their constituent pixels, these pixels are then processed in pipeline stages subsequent to the rasterization stage where the rendering operations are performed. Typically, these rendering operations involve reading the results of prior rendering for a given pixel from the frame buffer, modifying the results based on the current operation, and writing the new values back to the frame buffer. For example, to determine if a particular pixel is visible, the distance from the pixel to the camera is often used. The distance for the current pixel is compared to the closest previous pixel from the frame buffer, and if the current pixel is visible, then the distance for the current pixel is written to the frame buffer for comparison with future pixels. Similarly, rendering operations that assign a color to a pixel often blend the color with the color that resulted from previous rendering operations. Operations in which a frame buffer value is read for a particular pixel, modified, and written back are generally referred to as R-M-W operations. Generally, rendering operations assign a color to each of the pixels of a display in accordance with the degree of coverage of the primitives comprising a scene. The per pixel color is also determined in accordance with texture map information that is assigned to the primitives, lighting information, and the like.
In many systems, the capability of performing R-M-W operations presents a hazard that must be overcome in the system design. In particular, many systems process multiple primitives concurrently. However, most graphics systems present the appearance that primitives are rendered in the order in which they are provided to the GPU. If two sequential primitives utilized R-M-W operations, then the GPU must give the appearance that the value that is written by the first primitive is the value read by the second primitive for any particular pixel. The hazard for the system is how to concurrently process primitives yet maintain the appearance of sequential processing as required by many graphics programming models (e.g. OpenGL or DirectX).
A variety of techniques exist to mitigate the R-M-W hazard depending on the application. A system may maintain a transaction log of the color updates required for a pixel. At the end of rendering a scene, the sorted transaction log may be used to create the final pixel color. Another common solution is referred to as a “scoreboard”. A scoreboard is an array of memory that is used to indicate all of the screen locations where rendering of R-M-W operations may be occurring at any given time. When a primitive is rasterized, each pixel is checked against the scoreboard and is only rendered if no other pixel is currently rendering the same location. When rendering for a pixel begins, the scoreboard is marked for the pixel location. Upon completion of rendering, the scoreboard for a location is cleared. In this way, the system can render concurrently pixels in primitives which do not overlap pixels from other primitives, and will render serially any pixels in primitives which do overlap.
A problem exists however with the ability of prior art Scoreboard 3-D rendering architectures to function with the latency that occurs when accessing graphics memory. For example, as pixel fragments are updated in a graphics memory (e.g., frame buffer, etc.), an undesirable amount of latency is incurred as the scoreboard mechanism functions to mitigate the R-M-W hazards. As described above, depending on the specifics of individual systems, a large amount of this latency is due to the scoreboard checking of concurrently rendered pixels.
Thus, a need exists for a rasterization process that can scale as graphics needs require and provide added performance while reducing the impact of graphics memory access latency.