1. Field of the Invention
The present invention generally relates to computer systems. More particularly, the present invention relates to computer graphic processing hardware and methods of caching data in the graphics request buffer(s) where the graphics request buffers contain commands that direct the graphics hardware processing.
2. Description of the Related Art
Modem computer platforms often have one or more separate graphic hardware platforms, commonly called a xe2x80x9cgraphics card,xe2x80x9d which have associated application-specific hardware and software for graphics data processing. The graphics hardware common in the industry include one or more data buffers, referred to as xe2x80x9crequest buffers,xe2x80x9d that receive graphics data from one or more host processors, and are processed by the graphics hardware. Request buffers can reside in either host memory or memory on the graphics hardware. The graphics hardware can access the ml request buffers with a direct memory access (DMA) mechanism for very fast throughput.
In a 3-dimensional (3D) graphics environment, the need for graphics data throughput is particularly acute due to the significant amount of data contained in the complex 3D graphics. The graphics hardware requests occur from graphic processing calls made by the application executing on the host CPU, typically from graphics application programming interfaces (APIs), such as OpenGL or Direct3D.
A plurality of request buffers are often used so that while one request buffer is being written with data by the host, the data in the previous request buffer is being sent to the graphics hardware for processing, possibly through a DMA channel. The use of the plurality of request buffers thus improves performance in allowing overlap between the host writing to one request buffer and the graphics adapter processing graphics data from another request buffer.
In some host CPU architectures, a mechanism called write-combining accelerates writes to the graphics hardware. Accordingly, allocating the request buffer in memory in the graphics hardware and using the write-combining mechanism can give extremely good graphics data writing performance. As the graphics data from the host is written into memory on the graphics hardware, no additional host bus transfers of the graphics data are required to process the graphics data held in the request buffer(s).
Graphics hardware that does not have local memory for the graphics CPU can still utilize write-combining to speed graphics data processing. The request buffers are allocated in host memory as non-cacheable. Write-combining transfers to the non-cached request buffers still produce good write performance, and since the buffers are non-cacheable, DMA transfers can be used to move the data to the graphics hardware, such as AGP 4xc3x97 DMA transfers. Because the AGP 4xc3x97 DMA transfers are not snooped by the host CPU cache, the graphics data must be guaranteed to be in memory by using either non-cached memory or by cache flushing.
However, write-combining does not accelerate reads of the request buffer. Even so, the reads of the request buffer(s) are not performance critical since the vast amount of graphics data being moved is from the host CPU to the graphics hardware.
There have been changes in industry-common host CPU architectures, such as the Pentium IV from Intel, which require alteration to the approach of constructing request buffers using write-combining, irrespective of whether the request buffer(s) is located in the graphics hardware memory or host memory. A particular characteristic of the modern CPU architecture is to send small bursts of graphics-related data to the graphics hardware for processing. As write-combining only works well if large bursts of data are sent across the graphics hardware bus or host bus, the small bursts of graphics data sent from the modem CPU can greatly reduce the performance of graphics related data moves using write-combining. Write-combining therefore becomes a less efficient data movement mechanism to supply the graphics related data to the graphics hardware for processing.
It would therefore be advantageous to provide a method for caching graphics-related data in the graphics request buffer(s) whereby the data is not flushed to the host memory if it is duplicative of graphics related data already stored. Furthermore, such method should account for changes in modem host CPU architectures wherein short bursts of graphics related data are commonly sent from the host CPU to the graphics hardware. It is accordingly to the provision of such a methodology for caching graphics-related data in the graphics requests buffer(s) that the present invention is primarily directed.
Briefly described, the present invention is a method for caching graphics-related data in one or more graphics request buffers wherein duplicative graphics-related data is not written to the graphics request buffers. The method for caching graphics-related data into a least one of a plurality of graphics request buffers includes the steps of initializing a flush start pointer in one of the plurality of graphics requests buffers prior to the receipt of any graphics-related data at the request buffer(s), then receiving a graphics-related data at the one of the plurality of graphics request buffers. The graphics related data is preferably a frame comprised of setup data and model data, and the entire frame is held within the plurality of graphics request buffers.
The method further includes the steps of repositioning the flush start pointer to the beginning memory location in the plurality of graphics request buffers where the incoming frame will be written. The location of the pointer can be handled either locally at the request buffer or through the graphics CPU, or managed through a combination of the request buffer and graphics CPU. Then the graphics related data, such as the frame, is written to the memory location referenced by the flush start pointer, and upon the request buffer(s) receiving an additional frame of graphics-related data, a determination is made as to whether model data is present in the additional frame. If model data is present in the additional frame, the method includes the step of flushing the stored frame from the plurality of graphics buffers for processing, and if model data is not present in the additional frame, then the method includes the step of writing the additional frame to the plurality of graphics request buffers.
If the model data was flushed from the plurality of request buffers, the model data from the additional frame (or graphics related data) is compared with the flushed model data from the stored frame, and if the model data from the additional frame does not match the flushed model data, the additional frame is written to the plurality of graphics request buffers. Otherwise, if the model data from the additional frame matches the flushed model data, the method includes the step of receiving, but not writing, the entire frame or graphics related data sequence. Finally, the flush start pointer is incremented to a new memory location where further graphics related data, such as an additional frame, would be written if received containing new data.
In the preferred method, the graphics-related data is sent in frames and each frame contains frame setup data and graphical model data. The model data is compared between the stored frame and the new frame to determine if there is new model data to be written to the graphics request buffers. Further, a plurality of reference pointers can be used such that this method includes the steps of writing the frame to the memory location referenced by the flush start pointer, referencing a second pointer to a memory location in one of the plurality of graphics requests buffers prior to the receipt of any additional graphics-related data (such as a frame). In such an embodiment, the step of writing the additional frame to the plurality of graphics request buffers is writing the additional frame to the memory location in the plurality of request buffers referenced by the second pointer.
The step of comparing the model data from the additional frame with the flushed model data from the stored frame is preferably comparing the model data from the additional frame with the flushed model data from the stored frame and ceasing the comparison upon locating a substantial non-matching data set within the model data from the additional frame. One preferable manner to determine if model data is present in the additional frame with the stored graphics related data is to determine if the size of the additional frame is the same as the size of the stored frame. Furthermore, the step of flushing the stored frame from the plurality of graphics buffers for processing is preferably by use of DMA to the graphics hardware.
The present inventive methodology further provides for additional data optimization as part of the graphics related data has been determined to be static. Further analysis on the static data can reveal optimal methods for request buffer management, such as altering the data organization, one example being lossless data reduction of static elements. Static graphics-related data could also be cached within the graphics hardware memory to enhance throughput with the repeated processing of the common graphics-related data.
The present invention therefore provides a graphics related data processing methodology through the caching of the graphics related data in one or more request buffers wherein the graphics processing throughout is greatly enhanced due to the elimination of duplicative data being held in the request buffers. The present invention can be utilized in modern CPU architectures that provide small bursts of graphics-related data from the host CPU to the graphics hardware, as the plurality of cached request buffers can sort through the increased amount of incoming graphics-related data. Because existing graphics hardware includes one or more request buffers, the present methodology can be implemented as a data management tool on existing request buffer architectures, without the need for additional hardware controls. Moreover, existing request buffers can also have hardware modification to better support the caching method if so desired.