Substantial improvements in graphics hardware combined with standardized graphics languages have allowed the use of complex graphics functions in many common applications. As these applications increase, more and more graphics functionality are moved from the central processing unit to the graphics processor in a computer system. Thus, the graphics processor is now able to perform many functions so as to not slow down the performance of the computer system.
Many prior art graphics subsystems are implemented with a host processor constructing and generating a display list of instructions comprising graphics opcode instructions and parameters which are sent to a graphics processor. The graphics processor takes the list of instructions and creates or generates graphics primitives for a display device. In these graphics systems, information for the graphics processor may be supplied in a series of graphics primitives and are processed by the graphics processor by reading the display list data from linear memory locations which store these primitives.
These graphics primitives are constructed into graphics images displayed on a graphics display device. The graphics primitives typically include points, lines, and polygons representing graphics objects rendered by the graphics processor.
Prior art display list processing methods read display list instructions from a linear array of memory locations and, depending on the instructions, may continue to read the number of parameters needed to complete the instructions from consecutive memory locations. After each instruction is completed by the graphics processor, another iteration of fetching display list instructions is performed until the contents of the display list is completely processed.
Improvements have been made in the prior art to optimize the interface between graphics processors and host processors which execute display list instructions in order to minimize latency in instruction fetching and processing by graphics processor. One such improvement is the use of first-in-first-out data buffers which typically store up to 32 bytes of data. The FIFOs are added to the graphics processor to improve the graphics interface and also to improve the bandwidth needed by the host processor to process display list instructions.
The FIFOs further allow some level of concurrency between the host processor and the graphics processor. However, once a FIFO is full the host processor may be forced into wait states to enable the graphics processor to finish processing the display list instructions supplied by the host processor. Such wait states create inherent processing latency making it difficult for the graphics processor to process fast-paced computer graphics games and other fast paced interactive graphics programs.
Even if the graphics processor is capable of operating at high speeds required for graphics applications like 3D animations etc., the system bus found in many prior art computer systems may be too slow to allow sufficient transfer of data between the host processor and the graphics processor to alleviate any inordinate amount of wait states which may be experienced by the host processor.
Particularly, if the host processor loads one command and attempts to load additional commands and parameters into the graphics processor, the graphics processor may be busy processing previous commands and parameters. This may cause the graphics processor to assert a signal forcing the host processor into wait states. Consequently, the host processor may experience a significant number of wait states and can only load the next command and parameters when the graphics processor has completed processing its present command.
To alleviate the number of wait states experienced by the host processor in sending display list instructions to the graphics processor, many prior art systems require the graphics processor to process the display list to a frame buffer. The frame buffer may be on-chip and may not compete for the computer system's bus with the host processor or the graphics processor. Rendering of graphics primitives from the display list to the frame buffer allows the host processor to continue processing without any interruptions from the graphics processor while the graphics processor is rendering instructions from a previously supplied display list into a display space, e.g., 2D or 3D space.
Despite the reduction in wait states as a result of the graphics processor processing the display list to the frame buffer, many prior art graphics subsystems are prefabricated with a specific amount of frame buffer memory. This limits the amount of data that the graphics processor can render to the frame buffer. Such a limitation often hinders sophisticated graphics operations and functions such as polygon drawings, filling, and texture mapping by the graphics processor. In order to be able to increase the amount of memory available to graphics operations in a computer system, some prior art graphics systems allow host processors to write display list information to parts of host computer system memory. This alleviates the display memory constraints imposed by limited frame buffers in prior art graphics systems.
However, storing display list information in the host system memory may impose further restraints on the graphics processor during the processing of display list information. These restraints include the extraneous number of clock cycles needed to fetch and decode display list parameters from the host system memory. For example, during a request to the host memory by the graphics processor, if an instruction fetch cycle and a parameter fetch cycle are executed separately, the display logic may be required to request in a single transfer from the host processor with multiple parameters to engage in a multi-cycle transfer from the host memory.
And depending on the number of parameters needed, the graphics processor may have to make separate host memory requests and access to fetch display list instructions and parameters. Since each memory transfer requires a finite period of time to complete the first transfer ( i.e., setup time ) a single request to the host processor would require at least two setup periods e.g., one for fetching display list instructions and the other for fetching display list parameters for a specific instruction.
Thus, in a single request cycle to the host system memory, an instruction fetch may take about 4 clocks of setup time for transferring the requested instruction, e.g., one clock for data transfer, and about four clocks to fetch parameters. If an opcode instruction setup time could be removed during the fetching of parameters responsive to an opcode instruction, a total of 4 clocks could be saved. Consequently, the number of clocks required to perform a single instructional fetch operation from a display list may be reduced.
It is therefore desirable to provide an improved graphics data and instruction processing method for a computer system which allows processing of sophisticated graphics functions and operations without the inefficiencies and bottlenecks of prior art systems, e.g, without the extra number of cycles required to fetch and decode display list instructions from a host computer's main memory.