1. Field of the Invention
This invention relates to computer systems and, more particularly, to integrated bus bridge designs for use in high performance computer systems. The invention also relates to snooping optimizations in computer systems.
2. Description of the Related Art
Computer architectures generally include a plurality of devices interconnected by one or more buses. For example, conventional computer systems typically include a CPU coupled through bridge logic to an external main memory. A main memory controller is thus typically incorporated within the bridge logic to generate various control signals for accessing the main memory. An interface to a high bandwidth local expansion bug, such as the Peripheral Component Interconnect (PCI) bus, may also be included as a portion of the bridge logic. Examples of devices which can be coupled to the local expansion bus include network interface cards, video accelerators, audio cards, SCSI adapters, telephony cards, etc. An older-style expansion bus may be supported through yet an additional bus interface to provide compatibility with earlier-version expansion bus adapters. Examples of such expansion buses include the Industry Standard Architecture (ISA) bus, also referred to as the AT bus, the Extended Industry Standard Architecture (EISA) bus, and the Microchannel Architecture (MCA) bus. Various devices may be coupled to this second expansion bus, including a fax/modem card, sound card, etc.
The bridge logic can link or interface more than simply the CPU bus, a peripheral bus such as a PCI bus, and the memory bus. In applications that are graphics intensive, a separate peripheral bus optimized for graphics related transfers may be supported by the bridge logic. A popular example of such a bus is the AGP (Advanced Graphics Port) bus. AGP is generally considered a high performance, component level interconnect optimized for three dimensional graphical display applications, and is based on a set of performance extensions or enhancements to PCI. AGP came about, in part, from the increasing demands placed on memory bandwidths for three dimensional renderings. AGP provided an order of magnitude bandwidth improvement for data transfers between a graphics accelerator and system memory. This allowed some of the three dimensional rendering data structures to be effectively shifted into main memory, relieving the costs of incorporating large amounts of memory local to the graphics accelerator or frame buffer.
AGP uses the PCI specification as an operational baseline, yet provides three significant performances extensions or enhancements to that specification. These extensions include a deeply pipelined read and write operation, demultiplexing of address and data on the AGP bus, and ac timing specifications for faster data transfer rates.
Since computer systems were originally developed for business applications including word processing and spreadsheets, among others, the bridge logic within such systems was generally optimized to provide the CPU with relatively good performance with respect to its access to main memory. The bridge logic generally provided relatively poor performance, however, with respect to main memory accesses by other devices residing on peripheral busses, and similarly provided relatively poor performance with respect to data transfers between the CPU and peripheral busses as well as between peripheral devices interconnected through the bridge logic.
Recently, however computer systems have been increasingly utilized in the processing of various real time applications, including multimedia applications such as video and audio, telephony, and speech recognition. These systems require not only that the CPU have adequate access to the main memory, but also that devices residing on various peripheral busses such as an AGP bus and a PCI bus have fair access to the main memory. Furthermore, it is often important that transactions between the CPU, the AGP bus and the PCI bus be efficiently handled. The bus bridge logic for a modem computer system should accordingly include mechanisms to efficiently prioritize and arbitrate among the varying requests of devices seeking access to main memory and to other system components coupled through the bridge logic.
One important aspect associated with bus bridge performance involves snooping operations on the processor bus when a memory write request or a memory read request from a peripheral device such as a PCI device is received. In the case of a memory write by the PCI device, the snoop cycle on the processor bus is required to determine whether a valid line corresponding to the write data exists in the cache of the processor and, if present, to invalidate the line. Furthermore, if the line is modified, the data in the cache may need to be written back to main memory. Similarly, in the case of a memory read by the PCI device, if the line corresponding to the read is modified in the cache, the data in the cache must typically be written back to main memory to allow the data to be read by the PCI device.
Substantial overhead and latency may be associated with the effectuation of a snoop cycle on the processor bus and with related functionality of the bus bridge. Before the bus bridge can initiate the snoop cycle it must arbitrate for the processor bus and wait for any locked transactions to complete. In addition, in the case of a memory write operation by the PCI device, if writeback data is received by the bus bridge from the cache, the writeback data may either need to be written to memory before the data from the PCI bus is written, or be merged with the PCI write data. In the case of a memory read operation by the PCI device, if writeback data is received by the bus bridge from the cache, the writeback data may need to be written to memory before the data can be read by the PCI device, or the writeback data may be snarfed while it is pending in the bridge but prior to its write to main memory. In any of these cases, the snoop cycle must typically be completed before the PCI read or write can be completed by the bus bridge. If the PCI device performs a subsequent read or write operation to an additional cache line address, the bus bridge must initiate another snoop cycle on the processor bus by repeating the foregoing process. That is, the bus bridge must again arbitrate for the processor bus, wait for any locked transactions to complete, and effectuate the additional snoop cycle. Again, the PCI read or write can typically not be completed by the bus bridge until the snoop cycle is completed. The arbitration phase required for obtaining the processor bus upon each snoop cycle can consume considerable bandwidth of the CPU bus. Additionally, the required effectuation and completion of the snoop cycle on the processor bus can limit performance of devices residing on the PCI bus, particularly in situations where a PCI device performs multiple reads and/or writes to main memory. Similar problems are associated with devices residing on other buses, such as an AGP bus.
It would accordingly be desirable to provide a system and method in a computer system wherein the snoop functionality is optimized. It would particularly be desirable to optimize performance of devices that initiate multiple consecutive accesses to memory, and to optimize the bandwidth of the CPU bus.