Computer graphics systems, set top box systems or other graphics processing systems typically include a host processor, graphics (including video) processing circuitry, memory (e.g. frame buffer), and one or more display devices. The host processor may have a graphics application running thereon, which provides vertex data for a primitive (e.g. triangle) to be rendered on the one or more display devices to the graphics processing circuitry. The display device, for example, a CRT display includes a plurality of scan lines comprised of a series of pixels. When appearance attributes (e.g. color, brightness, texture) are applied to the pixels, an object or scene is presented on the display device. The graphics processing circuitry receives the vertex data and generates pixel data including the appearance attributes which may be presented on the display device according to a particular protocol. The pixel data is typically stored in the frame buffer in a manner that corresponds to the pixels location on the display device.
FIG. 1 illustrates a conventional display device 10, having a screen 12 partitioned into a series of vertical strips 13-18. The strips 13-18 are typically 1-4 pixels in width. In like manner, the frame buffer of conventional graphics processing systems is partitioned into a series of vertical strips having the same screen space width. Alternatively, the frame buffer and the display device may be partitioned into a series of horizontal strips. Graphics calculations, for example, lighting, color, texture and user viewing information are performed by the graphics processing circuitry on each of the primitives provided by the host. Once all calculations have been performed on the primitives, the pixel data representing the object to be displayed is written into the frame buffer. Once the graphics calculations have been repeated for all primitives associated with a specific frame, the data stored in the frame buffer is rendered to create a video signal that is provided to the display device.
The amount of time taken for an entire frame of information to be calculated and provided to the frame buffer becomes a bottleneck in graphics systems as the calculations associated with the graphics become more complicated. Contributing to the increased complexity of the graphics calculation is the increased need for higher resolution video, as well as the need for more complicated video, such as 3-D video. The video image observed by the human eye becomes distorted or choppy when the amount of time taken to render an entire frame of video exceeds the amount of time in which the display device must be refreshed with a new graphic or frame in order to avoid perception by the human eye. To decrease processing time, graphics processing systems typically divide primitive processing among several graphics processing circuits where, for example, one graphics processing circuit is responsible for one vertical strip (e.g. 13) of the frame while another graphics processing circuit is responsible for another vertical strip (e.g. 14) of the frame. In this manner, the pixel data is provided to the frame buffer within the required refresh time.
Load balancing is a significant drawback associated with the partitioning systems as described above. Load balancing problems occur, for example, when all of the primitives 20-23 of a particular object or scene are located in one strip (e.g. strip 13) as illustrated in FIG. 1. When this occurs, only the graphics processing circuit responsible strip 13 is actively processing primitives; the remaining graphics processing circuits are idle. This results in a significant waste of computing resources as at most only half of the graphics processing circuits are operating. Consequently, graphics processing system performance is decreased as the system is only operating at a maximum of fifty percent capacity.
Changing the width of the strips has been employed to counter the system performance problems. However, when the width of a strip is increased, the load balancing problem is enhanced as more primitives are located within a single strip; thereby, increasing the processing required of the graphics processing circuit responsible for that strip, while the remaining graphics processing circuits remain idle. When the width of the strip is decreased (e.g. four bits to two bits), cache (e.g. texture cache) efficiency is decreased as the number of cache lines employed in transferring data is reduced in proportion to the decreased width of the strip. In either case, graphics processing system performance is still decreased due to the idle graphics processing circuits.
Frame based subdivision has been used to overcome the performance problems associated with conventional partitioning systems. In frame based subdivision, each graphics processor is responsible for processing an entire frame, not strips within the same frame. The graphics processors then alternate frames. However, frame subdivision introduces one or more frames of latency between the user and the screen, which is unacceptable in real-time interactive environments, for example, providing graphics for a flight simulator application.