Not applicable.
1. Field of the Invention
The present invention generally relates to sharing data among processors using cache memories in a computer system with multiple processors. More preferably, the present invention generally relates to a computer system in which processor cache memory coherence is maintained by use of cache coherence directory lookups and bus snooping. Still more particularly, the present invention relates to a system that identifies memory accesses as non-cacheable to minimize cache coherence directory lookups and bus snoops.
2. Background of the Invention
Modern day computer systems can include a single processor or multiple processors for higher performance. A host bridge unit coupled to each processor of the multiprocessing computer system allows the computer system to support many different kinds of devices attached to a multitude of different buses. The host bridge unit may connect to processor buses, a main memory bus, I/O bus, and connected through an I/O bridge unit, an advanced graphic port (xe2x80x9cAGPxe2x80x9d) bus, peripheral component interconnect (xe2x80x9cPCIxe2x80x9d) bus or peripheral component interconnect extended (xe2x80x9cPCIxxe2x80x9d) bus. Each of the processor buses can support a maximum number of processors (e.g., 4, 6, 8, 12 etc.) connected to the processor bus while still maintaining bus communication bandwidth for sufficiently high performance.
Each processor of the computer system includes a memory cache either integrated into the processor chip itself or external to the processor chip. The memory cache stores data and instructions and improves processor performance by allowing high-speed access to the needed data and instructions resulting in reduced program execution time. In a computer system with multiple processors, each unit of data is identified as being owned by a particular processor. Requestor processors in the computer system may request a unit of data from an owner processor. The requesting processor may access data to perform either read or write operations. If a requesting processor modifies the data by performing a write, other processors of the computer system may have access to old, unmodified versions of the data. To remedy this problem, each processor maintains a local record of the addresses cached on the various processors and the particular xe2x80x9cstatexe2x80x9d of each unit of data associated with the address in a cache coherence directory.
A xe2x80x9cstatexe2x80x9d describes the copies of the data unit stored in the memory caches of the particular system. The computer system, using a cache coherence directory, implements a coherency protocol that enforces the consistency of data in the cache memories. The coherency protocol describes the different states of a data unit. A data unit may be in a shared state that corresponds to processors having a read only copy of the data unit. Alternatively, a data unit may be in an exclusive state in which only one requestor processor contains a copy of the data unit that it may modify.
Use of a coherence protocol requiring a cache coherence directory may call for excessive utilization of the processor bus interconnecting the processors. A xe2x80x9cbus snoopxe2x80x9d involves accessing the bus to communicate with other processors on the processor bus to monitor and maintain coherency of data. A bus snoop is needed whenever a requestor processor needs access to data that it does not have an exclusive copy of or is not the owner. Large amounts of snoop traffic can seriously impact computer system performance. One solution to this problem is to compare the address of the data to the cache coherence directory to determine if one of the other processors owns the address or has an exclusive copy. If the cache coherence directory indicates ownership of the address or an exclusive copy by a different processor, a bus snoop is performed. If the requesting processor owns the address or has an exclusive copy, a bus snoop is not performed, thus preserving processor bus bandwidth.
Hardware to maintain the coherency of the data includes a cache coherence controller and cache coherence directory. The cache coherence directory preferably includes enough Random Access Memory (xe2x80x9cRAMxe2x80x9d) to maintain a sufficient record of the addresses cached on the various processors and the particular state of each unit of data associated with the address. It would be advantageous if the cache coherence directory and cache coherence protocol could be implemented in such a way as to be able to quickly retrieve memory requests from the processor and peripheral devices. To implement a fast cache coherence directory, interleaved banks of RAM can be used. To further reduce the access time for processor and peripheral device memory requests, the cache oherence protocol could be implemented to reduce the number of memory requests that must be compared to the cache coherence directory. One way to reduce memory request access times would be for the host bridge unit to identify memory requests as non-cacheable and then skip the cache coherence directory lookup and bus snoop. Despite the apparent performance advantages of such a system, to date no such system has been implemented.
The deficiencies of the prior art described above are solved in large part by an apparatus for identifying non-cacheable requests to main memory in a computer system with multiple processors. The apparatus includes a main memory, memory cache, processor and cache coherence directory all coupled to a host bridge unit (North bridge or memory controller). The processor transmits requests for data to the main memory via the host bridge unit. The host bridge unit includes a cache coherence controller that implements a protocol to maintain the coherence of data stored in each of the processor caches in the computer system. The cache coherence directory connects to the cache coherence controller. The cache coherence directory contains the addresses of data stored in each of the processor caches and the state of the data. After receiving the request for data from main memory, the host bridge unit identifies requests for data to main memory as cacheable or non-cacheable. If the host bridge unit determines that the data is cacheable, then it requests the cache coherence controller to perform a cache coherence directory lookup to maintain the coherence of the data. If the data is non-cacheable, then the host bridge unit does not request the cache coherence controller to perform a cache coherence directory lookup.
An I/O bridge unit (South bridge or I/O controller) is coupled to the host bridge unit. The I/O bridge unit connects to various peripheral buses and through these buses transmits requests from peripheral devices for data to the I/O bridge unit. The I/O bridge unit then transmits the peripheral device request for data to the host bridge unit. The host bridge unit identifies requests for data as cacheable or non-cacheable. If the data is non-cacheable, then the host bridge unit does not request the cache coherence controller to perform a cache coherence directory lookup. The peripheral bus may be a PCI bus, PCIx bus, or AGP bus.
The preferred embodiment of the invention comprises a combination of features and advantages that enable it to overcome various problems of prior devices. The various characteristics described above, as well as other features, will be readily apparent to those skilled in the art upon reading the following detailed description of the preferred embodiments of the invention, and by referring to the accompanying drawings.