The present invention relates generally to systems for computer graphics. More specifically, the present invention includes a method and apparatus for increasing the graphics throughput of systems that use an Accelerated Graphics Port (AGP) or other memory interconnect.
FIG. 1 is a block diagram of a typical PCI-based personal computer system. For this type of architecture, a PCI bridge is positioned at the center of a processor/cache combination, a system memory, and a PCI bus. Each of these components has its own connection to the PCI bridge. This allows the PCI bridge to act as a gateway between the microprocessor/cache combination, system memory, and PCI bus. The PCI bus provides points of attachment for a graphics processor and other I/O devices.
At the time of their introduction, the PCI based personal computers were a vast improvement over more traditional PC architectures. The improvement was due largely to the high speed of the PCI bus. This greatly improved the bandwidth available to perform I/O tasks. This increased bandwidth lead to the introduction of faster I/O devices, designed to further exploit of the capabilities of the PCI bus.
Unfortunately, the bandwidth required for these improved I/O devices has largely eroded the capacity of the PCI bus. Graphics processors are a key part of this problem. Graphics processors are increasingly used in way that requires large amounts of data to be transferred between system memory and the graphics processor itself. These data transfers tend to tie up the PCI bus and starve the remaining I/O devices.
To overcome these limitations, many PC manufactures have adopted AGP (advanced graphics port) based architecture. As shown in FIG. 2, The AGP architecture is not unlike the PCI architecture of FIG. 1. The AGP architecture is improved, because the graphics processor is no longer attached to the PCI bus. Instead, the graphics processor is now provided with its own dedicated connection to the AGP bridge.
The dedicated connection between the graphics processor and AGP bridge increases the rate at which data can be transferred to the graphics processor. At the same time, contention on the PCI bus is reduced. The overall result is that AGP based personal computer offer a significant performance increase over even PCI based systems.
At the same time, even AGP based systems suffer from performance bottlenecks. One of these bottlenecks arises when a graphics application creates and then stores large textures in the system memory. For many graphics applications, these textures can be quite large, extending over many megabytes. These textures are also subject to interactive modification. This allows the textures to be interactively changed for simulation and other environments. The generation and modification of textures creates considerable traffic between the microprocessor/cache combination and the system memory. At the same time, the graphics processor may be accessing the generated textures or other data within the system memory. As a result, a performance bottleneck arises at the interface between the system memory and the AGP bridge.
For these reasons, a need exists for a system that reduces or alleviates the performance bottleneck at the interface between the system memory and the AGP bridge. This need is particularly great for systems that are intended for high-performance graphics applications where large textures are stored, modified and accessed in a system memory.
The present invention provides a graphics cache memory that accelerates throughput within a memory bridge, such as AGP. The graphics cache is positioned within the memory bridge and intercepts graphics data that is generated by a processor/cache combination for storage in a system memory. The graphics cache is searched each time that the graphics processor requests data from the system memory. Successful searches result in the requested data being retrieved from the graphics cache. Unsuccessful searches result in the requested data being retrieved directly from the system memory.
The graphics cache is preferably updated following each successful search and retrieval to indicate that the retrieved data is no longer cached. This read-once strategy simplifies the cache structure and allows the memory within the graphics cache to be rapidly reused. The rapid reuse means that a relatively small cache may be used to create a relatively large performance increase.
In this way the present invention provides a system that reduces the overall amount of traffic passing between the memory bridge and the system memory. This reduces or alleviates the bottleneck created by graphics data traversing this interface.