1. Field of the Invention
The present invention is related generally to computer graphics systems, and more particularly to a computer tool for making the graphics rendering process more efficient.
2. Related Art
In today""s technological climate, the availability of computers, combined with their use in both business and pleasure, has led to an ever increasing demand for sophisticated computer tools, games, and processing techniques. The more sophisticated the desired operations, the more a computer""s processing power is taxed, and the more valuable efficiency becomes. Software programmers, hardware designers, and operators alike are constantly looking for ways to improve speed, efficiency, and/or processing power from their software, hardware and applications. This desire is especially concentrated in the computer graphics field.
The popularity of the computer has led to an enormous rise in the popularity of computer applications. At the same time, an ever more sophisticated industry has demanded computer tools for applications such as flight simulators, sophisticated presentation applications, realtime navigation tools, computer aided design and testing, weather forecasting, and the like. Each presents new challenges to the computer graphics field.
A computer graphics rendering pipeline supports each of the above mentioned applications. An application passes graphics data to a computer graphics rendering pipeline. The pipeline then processes the graphics data for display on a computer monitor, television screen, or other similar device for displaying visual media. In applications where the output is rapidly changing, such as flight simulators and many computer games, the efficiency of the rendering pipeline is vital to overall system performance. Improving the processing efficiency in each stage of a graphics pipeline is desired. It is this graphics rendering pipeline that the current invention seeks to improve.
A graphics rendering pipeline consists of a number of stages. Processing is performed in each stage. Processing performed in subsequent or xe2x80x9cdownstreamxe2x80x9d stages is dependent upon processing performed in earlier or xe2x80x9cupstreamxe2x80x9d stages. Stages can be implemented as units in software, firmware, and/or hardware. A stage can have a fixed amount of buffering.
In practice a tolerance is provided in a stage. Tolerance is the amount of extra capacity built into a stage to allow a guaranteed amount of work to be processed by the pipeline. The maximum tolerance of any unit in a pipeline determines the maximum effective tolerance of the pipeline as a whole. In some cases, a stall token is passed from an upstream unit to a downstream unit. The stall token is a marker that causes the subsequent pipeline unit to stall when the token is received at the subsequent pipeline unit. A period of inefficiency arises when a unit (or application) must wait for a downstream operation to complete while intervening unit(s) have unused processing capacity. A stalled pipeline unit (i.e., a unit that has received a stall token) can be inefficient when it is stalled waiting for a downstream operation to complete.
Consider the example of double-buffered system. In a non-double-buffered system, the increment by which one can increase the amount of data buffered is enough frame buffer memory to hold an additional complete frame. This increment (equal to the size of a frame) can be prohibitively inefficient. There is a motivation to increase processing efficiency in stages preceding a display stage with double-buffering thus allowing processing to proceed even when the current buffer being displayed has not been consumed totally. There is a further need to increase the efficiency of preceding stages, such as, the geometry stage, to avoid unnecessary waiting or stall periods.
An example graphics rendering pipeline with double-buffering consists of three basic stages. This example is illustrative and not intended to limit the present invention. As shown in FIG. 1, this example pipeline consists of a geometry stage 110, a raster stage 120, and a display stage 130. Graphics data is submitted by an application to the pipeline for processing in pipeline operations. Each image, or scene, to be rendered and displayed at a given display interval is referred to as a frame.
Geometry data (also called geometric data, primitive data or primitives) representing a frame to be rendered is input to the geometry stage 110. Geometry data for each respective frame of graphics data is separated by a frame complete marker (fc) FIG. 1 shows an example of three frames of geometry data (xcex941-xcex943) 105, separated by frame complete markers (fc1-fc3) being passed from an application (not shown) to geometry stage 110. Frame markers are commands by which an application specifies frame boundaries. The geometry data is then processed on a frame by frame basis by the geometry stage 110, raster stage 120, and display stage 130 in succession. In such a double-buffered system, at the same time the current frame is being displayed from pixel data in a front buffer 129, rendering of the following frame occurs where results are drawn to the back buffer 127. A swap buffer switch 125 alternates which buffer is currently being drawn, and which buffer is currently being displayed. Once the current frame is rendered and displayed, a vertical retrace signal is sent to the head of the pipeline ordering the next frame to enter the rendering pipeline and begin the rendering process.
Two timing diagrams of a normal scheme 150 and a problem scheme 160 that has incurred a frame drop 162 are also shown in FIG. 1. The diagrams further illustrate the motivation for extra capacity in stages of a multi-buffering pipeline. The display interval 152 controls the rate at which the display stage 130 processes data. The tolerance interval 156, is a predetermined, yet flexible, amount of time. In one example, the tolerance interval 156 is used by system designers to account for the differing frame rendering times 154 that often occur in complex applications. Setting an appropriate duration of the tolerance interval 156 however involves trade-offs. Since the display interval 152 is typically fixed, increasing tolerance interval 156 reduces the frame rendering time (FRT) 154. On the other hand, decreasing tolerance interval 156 runs a risk of exceeding the display interval 152 resulting in dropped frames. A dropped frame 162 typically manifests itself as a flicker in the display medium, and it affects the visual quality of the running application.
One problem that occurs in conventional graphic pipeline processing is that a high amount of dead time can occur during tolerance interval 156. For instance, dead time can occur when an application must wait for a vertical retrace signal before it starts loading a geometry stage 110. In this example pipeline, this results in deadtime or a stall where geometry stage 110 is idle. This deadtime wastes host bandwidth available during this period, and also incurs startup lag when the pipeline is restarted.
The current invention addresses the above described problems that can occur in a multi-buffered system.
The present invention meets the above-mentioned needs by providing a system, method, and computer program product for providing increased processing efficiency in at least one stage of multi-buffered graphics rendering pipeline. The present invention reduces or eliminates dead-time in a rendering pipeline. In one example, the present invention allows a geometry stage (and other processing stages) to be used more efficiently in that the geometry stage can process data even during a portion of a tolerance interval. This more efficient processing is also referred to as xe2x80x9cbuying backxe2x80x9d what would otherwise be xe2x80x9cdeadtimexe2x80x9d in a stage during the tolerance interval.
In one embodiment, a queue is instantiated to keep track of the sequence of frames that are allowed to go forward. This data can go forward even when a buffer such as a back buffer is waiting for a vertical retrace signal. The data for the current frame that is issued from an application is allowed to proceed through the first stage of the rendering pipeline, and is then stalled.
The system includes a frame boundary marker queue and a stall token installer established at the head of the rendering pipeline. The system also includes a stall controller comprising the software control logic for overall system coordination. The system increases the efficiency in a graphics rendering pipeline by effectively reducing the tolerance period.
The method increases the efficiency of at least one stage of a multi-buffered graphics rendering pipeline. According to an embodiment of the present invention, the method adds a queue to the head of the graphics rendering pipeline. The queue stores frame boundary markers. The method also installs at the same point a corresponding stall token. With this queue established, and the stall tokens inserted, the first stage of the rendering process continues for the data stream, regardless of the state of the remainder of the rendering pipeline. The pipeline is then stalled and subsequently unstalled, when appropriate, according to the stalling logic at the end of the geometry stage. By allowing the first stage of the rendering to proceed, the overall efficiency of the graphics rendering pipeline is improved by the amount of time it takes to complete the remaining stages of the pipeline minus the time the frame would have taken, if the invention had not been implemented.
One advantage of the present invention is that it takes advantage of the dead-time that would normally occur if the rendering time of the running application is less than the display time. By efficiently making use of this dead-time, the problem of a dropped frame can often be avoided.
Another advantage of the present invention is that when the swap buffer operation is called, the primitive data of the next frame has already cleared in the first stage of the pipeline. Any non-raster operations going towards the texture units may proceed at full speed.
Another advantage of the present invention is that it is easily implemented in any multi-buffered system that experiences similar dead-time, and has a stage in the process which lends itself to being stalled.
Further features and advantages of the invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.