Typical computer systems employ a graphics accelerator card for enhancing the resolution and the display of graphics. The display of graphics requires a two part process, rendering and geometry acceleration. In prior art graphics cards, the geometry phase was performed by the central processing unit (CPU) of the computer system while the rendering phase was performed by the graphics card. The (CPU) is often referred to as a host processor. This often overloaded the CPU, since graphics were vying for processor time with external applications. Currently, high-end graphics cards have been configured to perform both the rendering phase and the geometry phase. This system improves performance and graphic rendering because the central processing unit is free to perform other processes while the graphics are being processed on the graphics card.
Although performance is increased during processing by having the graphics card perform both rendering and geometry acceleration, the graphics request must still be sent to the graphics card through the CPU which involves significant memory swaps between RAM memory and cache memory associated with the CPU.
See FIG. 1 for a schematic diagram of the components involved in an exemplary prior art graphics card. FIG. 1 shows a host processor 9 of a computer system which is connected to a bus 1. The bus 1 is used for transporting information to and from various components of the computer system, including main memory 7. The host processor 9 receives a request from an application level program to create a graphics display. The request may be in the form of a group of instructions which accesses an application program interface ("API") 11. The API converts the instructions into a graphics request stream 10 which is capable of being understood by the graphics accelerator. The graphics request stream 10 is transmitted to a cache 8 associated with the host processor, and placed into a cache line via bus 1. The graphics request stream is transported from the cache 8 across the bus 1 and deposited in a graphics memory location 106 of the graphics card 104. The graphics request stream 10 is processed by a graphics processor 105 and then sent to a display device.
FIG. 2 shows a prior art method of receiving the graphics request and transporting the graphics request stream to the graphics accelerator card for processing. The process begins at step 302, in which an application level program makes a request for a graphics display. This causes the appropriate functions of the API 11 to be called. The result of the API functions form a graphics request stream 10 based on the request from the application level program in step 304.
The host processor 9 writes the graphics request stream 10 to main memory 7 in step 306, which requires the graphics request stream to pass across the system bus. Cache read and write is indicated by a subscript numeral in FIG. 1. Because the position in main memory 7 that is written to is typically not in the cache 8, and the cache line usually has data in it that is not coherent with main memory 7, a cache line swap must take place. This involves writing the current cache line contents into an associated main memory location 7, (step 308), and writing the newly addressed cache line 12 having the graphics request stream into the cache (step 310). Thus, writing the graphics request stream to the cache of the CPU requires the graphics request stream to pass across the system bus twice. Once the data of the graphics request stream 10 is cached in the cache memory, it still must be moved into the graphics system before rendering can occur, thus requiring a third crossing of the system bus, (step 312). To do this, a graphics processor 105 on the graphics card 104 is controlled by driver software. The driver software causes the host processor to read the graphics request stream 10 from the cached memory 8, and then passes the graphics request stream to the graphics processor 105 of the graphics card which writes it into a memory location 106 for processing (step 314). Once initiated, the graphics processor 105 proceeds without further intervention by the CPU 9, and the processed graphics request stream is displayed by a display device, (step 316).
In summary, each word of data of the graphics request stream that is moved into the graphics accelerator requires two transactions for storage in cache memory, and one transaction to move it from cache memory 8 to the graphics pipeline 106. Processing data in this way thus requires at least three read/writes across the system bus, consequently reducing the rendering speed to no faster than about thirty-three percent of the system bus rate.