1. Field of the Invention
The present invention relates generally to the management of limited resources in a computer graphics system and, in particular, to improving the performance of graphics systems through pre-loading and synchronization of draw commands and textures.
2. Description of Background Art
Computer games and other graphics-intensive software applications demand increasing performance from the graphics hardware upon which they run. However, hardware resources are limited, e.g., by memory, bus availability, and processor speeds, which in turn limit the performance of the software. Therefore, optimizing the use of these limited resources available to application developers is critical to the overall performance of the games and other applications themselves—and to their success.
One of the limited resources in a graphics system is texture memory. On game consoles, for example, a game application must typically move multiple textures through the texture memory many times to draw a single display frame. In typical systems, 60 to 100 or more display frames are drawn every second, resulting in a need for a large texture memory and high memory bandwidth.
Another limitation is the speed of the processors. Graphics-intensive applications typically have some CPU-limited calculations (e.g. scene-graph traversal, physics, animation) that generate data to be sent to the transformation and lighting (T&L) processor, and eventually to the pixel processor. Typical double-buffered graphics systems do not allow drawing to occur while the target frame buffer memory (i.e., the front buffer in a double-buffered system) is being displayed. Because of this limitation, applications must wait until a vertical retrace before processing graphics commands for the next frame. This waiting wastes a significant amount of CPU time and further underutilizes the T&L and pixel processors during this time.
FIG. 1 is a timing diagram of a typical graphics system, showing the processing events that occur for a particular frame between vertical retraces 10. FIG. 1 illustrates an example of a double-buffered graphics system, where a pixel processor writes the image of a frame onto a back buffer. The back buffer becomes the front buffer during a “swap event” (e.g., a vertical retrace 10), and the image stored in this buffer is displayed on the monitor or other display screen. (An example of a double-buffered graphics hardware architecture is described in greater detail in connection with FIG. 3.)
In existing systems, the application is typically held off from any application processing 15 until a buffer swap event occurs. This prevents the pixel drawing processor from writing to the displayed buffer and thus altering the image before it is displayed on the monitor. But it also wastes processing time because the application typically has some CPU-bound compute time (e.g., scene graph traversal, physics calculations, animation, etc.) before the application sends draw commands to the T&L processor and textures to the texture memory. During this time, represented as T1 in FIG. 1, the transformation and lighting (T&L) processor and the pixel processor are idle. After this time T1, the application sends draw commands to the T&L processor for processing 25 and begins to load 20 textures into texture memory for use with the draw commands. A short time after draw commands are sent and textures begin to load, the pixel processor begins to use the draw commands and textures to draw 30 pixels for the frame into the back buffer.
Buffer swap events occur in synch with the vertical retrace 10 of the monitor, and the period between buffer swap events may be any integral multiple of the vertical scan period (i.e., the time between vertical retraces 10). After each vertical retrace 10, the system displays 40 the contents of the front buffer. Importantly, if the pixel drawing 30 is not completed before the system's next vertical retrace 10 (as shown by the dotted extension arrow 30′ in FIG. 1), the system does not perform a swap of the back and front buffers because the next frame is still being drawn on the back buffer. In this case, the front buffer does not change, and the monitor continues to display 40 the same frame, thereby causing the frame rate performance to drop and thus diminishing the quality of the display. Alternatively, when the vertical retrace 10 occurs and the pixel drawing 30 is not finished for the frame, the systems does perform a buffer swap. In this case, the partially drawn buffer is displayed on the monitor, which results in display artifacts such as “tearing.” On the monitor, tearing causes the next displayed frame to be composed of a portion of the last frame and a portion of the next frame, which is also undesirable.
A significant factor that leads to inefficiency in existing systems is the loss of processing time due to the system's waiting for a vertical retrace to begin before processing the next frame. Accordingly, it is desirable to reduce this inefficiency to improve performance of graphics-intensive software by attempting to complete the pixel drawing for any given frame before the next vertical retrace. This advantageously maintains a high frame rate. Moreover, it can be appreciated that reducing the wasted processing time, illustrated by T1, allows pixel drawing to begin earlier, which in turn increases the probability that the pixel drawing will be completed before the next vertical retrace.
These constraints are complicated by the structure of typical graphics system hardware, which allows for asynchronous transformation and drawing processing and asynchronous texture loading. Having the application perform synchronization operations decreases available processing time for the CPU, T&L processor, and pixel processor. Moreover, implementing synchronization operations with the application is undesirably complex. In such a case, the application would have to determine the time before a texture is needed to begin its loading.