1. Field of the Invention
The present invention generally relates to sharing data among processors using cache memories in a computer system with multiple processors. More particularly, the present invention relates to maintaining the coherence of data in cache memories by using a cache coherence directory and bus snooping. Still more particularly, the invention relates to a computer system in which a sideband signal identifies memory requests as non-cacheable to minimize cache coherence directory lookups and bus snoops.
2. Background of the Invention
Modem day computer systems can include a single processor or multiple processors for higher performance. A host bridge unit coupled to each processor of the multiprocessing computer system allows the computer system to support many different kinds of devices attached to a multitude of different buses. The host bridge unit may connect to processor buses, a main memory bus, I/O bus, and connected through an I/O bridge unit, an advanced graphic port (xe2x80x9cAGPxe2x80x9d) bus, peripheral component interconnect (xe2x80x9cPCIxe2x80x9d) bus or peripheral component interconnect extended (xe2x80x9cPCIxxe2x80x9d) bus. Each of the processor buses can support a maximum number of processors (e.g. 4, 6, 8, 12 etc.) connected to the processor bus while still maintaining bus communication bandwidth for sufficiently high performance.
Each processor of the computer system includes a memory cache either integrated into the processor chip itself or external to the processor chip. The memory cache stores data and instructions and improves processor performance by allowing high-speed access to the needed data and instructions resulting in reduced program execution time. In a computer system with multiple processors, each unit of data is identified as being owned by a particular processor. Requestor processors in the computer system may request a unit of data from an owner processor. The requesting processor may access data to perform either read or write operations. If a requesting processor modifies the data by performing a write, other processors of the computer system may have access to old, unmodified versions of the data. To remedy this problem, each processor maintains a local record of the addresses cached on the various processors and the particular xe2x80x9cstatexe2x80x9d of each unit of data associated with the address in a cache coherence directory.
A xe2x80x9cstatexe2x80x9d describes the copies of the data unit stored in the memory caches of the particular system. The computer system, using a cache coherence directory, implements a coherency protocol that enforces the consistency of data in the cache memories. The coherency protocol describes the different states of a data unit. A data unit may be in a shared state that corresponds to processors having a read only copy of the data unit. Alternatively, a data unit may be in an exclusive state in which only one requestor processor contains a copy of the data unit that it may modify.
Use of a coherence protocol requiring a cache coherence directory may call for excessive utilization of the processor bus interconnecting the processors. A xe2x80x9cbus snoopxe2x80x9d involves accessing the bus to communicate with processors on the processor bus to monitor and maintain coherency of data. A bus snoop is needed whenever a requestor processor needs access to data that it does not have an exclusive copy of or is not the owner. Large amounts of snoop traffic can seriously impact computer system performance. One solution to this problem is to compare the address of the data to the cache coherence directory to determine if one of the other processors owns the address or has an exclusive copy. If the cache coherence directory indicates ownership of the address or an exclusive copy by a different processor, a bus snoop is performed. If the requesting processor owns the address or has an exclusive copy, a bus snoop is not performed, thus preserving processor bus bandwidth.
Hardware to maintain the coherency of the data includes a cache coherence controller and cache coherence directory. The cache coherence directory preferably includes enough Random Access Memory (xe2x80x9cRAMxe2x80x9d) to maintain a sufficient record of the addresses cached on the various processors and the particular state of each unit of data associated with the address. It would be advantageous if the cache coherence directory and cache coherence protocol could be implemented in such a way as to be able to quickly retrieve memory requests from the processor and peripheral devices. To implement a fast cache coherence directory, interleaved banks of RAM can be used. To further reduce the access time for processor and peripheral device memory requests, the cache coherence protocol could be implemented to reduce the number of memory requests that must be compared to the cache coherence directory. One way to reduce memory request access times would be for the host bridge unit to identify memory requests from peripheral devices as non-cacheable and then skip the cache coherence directory lookup and bus snoop.
The current generation of host bridge units has no dedicated hardware support for the identification of non-cacheable memory requests. Furthermore, the next generation of AGP bus implementations in computer systems will have graphics devices coupled to an I/O bridge in order to permit greater flexibility in the I/O subsystem. Thus, graphics devices will send requests for data stored in memory to the I/O bridge that will then forward the request to the host bridge. Even if the graphics device and AGP bus are implemented to inform the I/O bridge that the data requested is non-cacheable, because host bridge units have no dedicated hardware support for identification of non-cacheable memory requests, the information that the data for the memory request is non-cacheable will not be provided to the host bridge.
For the reasons discussed above, it would be advantageous to design a computer system with dedicated hardware capable of informing the host bridge that a memory request from a graphics device is to non-cacheable data so as to bypass the cache coherence directory lookup and bus snoop. Despite the apparent performance advantages of such a system, to date no such system has been implemented.
The deficiencies of the prior art described above are solved in large part by an apparatus for identifying memory requests originating on remote I/O devices as non-cacheable in a computer system with multiple processors. The apparatus includes a main memory, memory cache, processor and cache coherence directory all coupled to a host bridge unit (North bridge or memory controller). The I/O device transmits requests for data to the I/O bridge unit. The I/O bridge unit forwards the request for data to the host bridge unit and asserts a sideband signal to the host bridge unit if the request is for non-cacheable data. The host bridge unit includes a cache coherence controller that implements a protocol to maintain the coherence of data stored in each of the processor caches in the computer system. The cache coherence directory connects to the cache coherence controller. If the host bridge unit determines that the data is cacheable, (i.e. the sideband signal is not asserted) then it requests the cache coherence controller to perform a cache coherence directory lookup to maintain the coherence of the data. If the data is non-cacheable, (i.e. the sideband signal is asserted) then the host bridge unit does not request the cache coherence controller to perform a cache coherence directory lookup. Various I/O devices can be coupled to the I/O bridge unit through an AGP bus, PCI bus, or PCIX bus.
The preferred embodiment of the invention comprises a combination of features and advantages that enable it to overcome various problems of prior devices. The various characteristics described above, as well as other features, will be readily apparent to those skilled in the art upon reading the following detailed description of the preferred embodiments of the invention, and by referring to the accompanying drawings.