1. Field of the Invention
The present invention relates to a technique for pipelining commands along with data in a computer graphics system, and more particularly, to a system whereby setup and control commands for pipelined processing circuits are passed into the pipeline along with the data and processed in the desired processing order such that a pipeline data flush is not necessary between reconfigurations of the pipelined processing circuits.
2. Description of the Prior Art PA1 inputting data blocks having a predetermined number of bits to a first of the pipelined processing circuits; and PA1 for each pipelined processing circuit of the data processing pipeline, performing the steps of:
Pipelining is an implementation technique in which multiple instructions are simultaneously overlapped in execution. Each step in the pipeline completes a part of the instruction, for the work to be done in an instruction is broken into smaller pieces, each of which takes a fraction of the time to process as the entire instruction. Each of these steps is called a pipe stage or a pipe segment and is referred to herein as a "pipelined processing circuit." The pipelined processing circuits are connected one to the next to form a pipeline in which the instructions enter at one end, are processed through the respective pipelined processing circuits, and exit at the other end. As known to those skilled in the art, if the pipelined processing circuits process the data at approximately the same speed, the speedup from pipelining approaches the number of pipe stages.
Pipelining exploits parallelism among the instructions in a sequential instruction stream to achieve this processing speed improvement. Since computer graphics instructions are highly parallel, they are ideally suited for pipelining. Pipelining is thus widely used in computer graphics systems to perform the substantial processing on input data which is necessary in order to render the desired image to the display screen.
FIG. 1 illustrates a simplified prior art graphics pipeline 100 for processing primitives and context data received by a graphics transform engine 102 for manipulation. As known to those skilled in the art, graphics transform engine 102 typically converts its input data to screen coordinates and performs the desired manipulations on the data. For example, graphics transform engine 102 may perform tasks such as graphics context management, matrix transformation calculations, spline tessellation, and lighting model computations. The graphics transform engine 102 also controls vector and polygon rendering hardware. The data processed by graphics transform engine 102 is output to a random access memory (RAM) 104 which typically comprises a first-in-first-out buffer. Graphics processing commands output by graphics transform engine 102, on the other hand, are sent via a control path to each of the subsequent pipelined processing circuits to handle setup and control for context switching and the like. The processed data stored in RAM104 is then passed to a pixel processor 106 which performs functions such as Z interpolation, color and transparency interpolation and the like. The output of pixel processor 106 is then passed to a post processor 108 which performs functions such as gamma correction, dithering, window management and the like. The output of the post processor 108 is then output to a pixel cache 110 for further manipulation before storage in a frame buffer 112 for display. Such a graphics pipeline improves data processing efficiency in that each pipelined processing circuit operates on its data at the same time the other processing circuits operate on their data for a particular input instruction.
However, the processing efficiency of graphics pipelines is significantly limited by the problem of context switching graphics hardware between processes. As known by those skilled in the art, context switching occurs when subsequent instructions require the pipelined processing circuits to be reconfigured to process the latter instruction. For example, the transform engine 102 may be instructed to represent subsequent data as anti-aliased vectors. This instruction and the associated data are passed through the graphics pipeline 100 and processed by each of the processing circuits 102-110 before being stored in the frame buffer 112. Then, if the instruction received immediately after the instruction to draw the anti-aliased vectors is a different type of instruction, such as a draw and shade polygon instruction, each of the processing circuits 102-110 must be reconfigured to perform the appropriate operation on the data following the draw and shade polygon instruction. This is typically accomplished by context switching the graphics hardware between such instructions.
As shown in FIG. 1, the context switching of the graphics hardware within the pipeline 100 previously has been conducted via a separate control line routed to each of the pipelined processing circuits separate from the data in the pipeline. Instructions are sent via this control line to plug the pipeline 100 and to instruct each of the pipelined processing circuits to complete processing of the data currently in the pipeline. Once all the data in the pipeline has been processed, the pipelined processing circuits are reconfigured by switching the contexts to those for the next instruction and then resynchronizing the pipelined processing circuits when processing of the next instruction is to start.
Context switching graphics hardware between instructions in this manner has had a significantly adverse effect on processing efficiency because, as just described, the data associated with the first instruction must be completely passed through the pipeline before the contexts of the pipelined processing circuits are switched to accommodate the subsequent instruction. In other words, the pipeline is "plugged up" until the data for the previous instruction has completely propagated through the pipeline (a so-called "pipeline flush"). The time required for a single primitive to traverse the pipeline is called the pipeline latency, and this latency determines the duration of the pipeline flush. Such pipeline latency encountered during a pipeline flush removes the responsiveness and interactivity of the graphics system, and as input graphics primitives become more complex and the pipelines become longer, the pipeline latency problem grows. Moreover, since the current trend of graphics pipelines is towards higher level, more complex primitives which require more processing time in the pipeline, the penalty for a pipeline flush has become unacceptable if the computer graphics system is to function at high speeds. An alternative to pipeline flushing and resynchronization has thus become necessary for good system performance.
Hardware solutions have been proposed for minimizing the effect of pipeline latency by eliminating the need for pipeline flushing and resynchronization without interrupting the flow of commands to the pipeline. For example, a technique is described in a paper by Rhoden et al. entitled "Hardware Acceleration for Window Systems", Computer Graphics, Vol. 23, No. 3, July 1989, in which a separate path or pipeline "bypass" is provided for window primitives that do not require the pipeline. The pipeline bypass allows the window system direct access to various components of the pipeline, including the frame buffer. The philosophy behind such a pipeline bypass is that window systems often require fast access for operations that are comparatively simple. By offering a bypass, the overhead of the graphics pipeline is avoided while providing the simple services required by the window system. The net result is a system which provides good window system interaction even in the middle of a complex rendering operation.
Synchronization of the pipeline bypass is provided using a pipeline valve which provides explicit control over pipeline access to the frame buffer. The pipeline valve turns off data coming from the rendering hardware into the frame buffer, and when the pipeline output is stopped, the window system is free to access the frame buffer. However, the pipeline valve does not stop the transform engines, which continue to process primitives until the entire pipeline backs up. Thus, significant processing may proceed before the pipeline fills up. Then, while the pipeline valve is closed, the window system may move, resize or otherwise manipulate the windows on the display screen without regard to the contents of the pipeline. When the pipeline valve is opened, rendering will continue to the modified window structure. By providing primitives which are window relative, the primitives being rendered will appear in the correct location. Also, since it is unnecessary to stop the pipeline or prevent processes from continuing to place commands and data into the pipeline as result of this configuration, the window translation is completely transparent to the application.
Unfortunately, the processing improvements possible in accordance with the techniques of Rhoden et al. are primarily limited to window rendering. It is desired to improve such an approach so as to allow global variables within the pipeline to be changed for processing of subsequent instructions without stopping up or flushing of the pipeline, thereby preventing the loss of the many processing cycles typically used for a pipeline flush. However, such a technique must still maintain the input processes in the correct order so that the data is appropriately processed for rendering to the display screen. The present invention has been designed to meet these needs.