1. Technical Field of the Invention
The present invention relates to computer systems and, more particularly, to a multi-node computer system with a snoop filter.
2. Background Art
Large multi-processor computer systems are often organized in nodes in which at least some of the nodes include main memory, some number of processors and associated caches. Multiple processors can access and modify a cache line. An access is a read or write transaction (request) with respect to the cache line. A write request may be handled in directly such as through a read for ownership. A cache coherency protocol allows the processors to use the most recently updated version of the cache line. A popular cache coherency protocol is the MESI (modified, exclusive, shared, or invalid) protocol. There are various variants of the MESI protocol.
Snoop filters/directories have been designed to help maintain cache coherency between nodes. For example, FIG. 1 illustrates a multi-node processor system including a node 0 and a node 1. Multi-node systems may include more than two nodes. Node 0 includes a processor bus 0 (sometimes called a front side bus), four processors P0, P1, P2, and P3, a memory controller hub 14, and main memory 16. The processor bus has been a multi-drop bus, but using a point-to-point interconnect bus has been suggested. Node 1 includes a processor bus 1, four processors P4, P5, P6, and P7, a memory controller hub 24, and main memory 26. Processors P0, P1, . . . P7 have corresponding caches 18-0, 18-1, . . . 18-7. For some processors, the caches are called L0, L1, and L2 caches, but the names are not important and there may be more or less than three caches per processor. The L2 caches may be on the same die as the processor or on a different die. A coherency controller switch 30 is coupled between memory controller hubs 14 and 24 as well as I/O hub 38 and I/O hub 40. Memory controller hubs 14 and 24 are sometimes referred to as a North bridge. Memory controller hub 14 is the local (home) memory controller hub for node 0 and memory controller hub 24 is the local memory controller hub for node 1. I/O hubs 38 and 40 are sometimes referred to as South bridges. I/O hubs 38 and 40 also have caches 42 and 44 respectively. The caches of the I/O hubs and the caches of the processors are called caching agents.
An individual node include circuitry to maintain cache coherency inside that node through a cache coherency protocol such as the MESI protocol or a variant of it. For example, the circuitry to maintain cache coherency in node 0 is distributed amongst interfaces for memory controller hub 14 and processors P0-P3.
Coherency controller switch 30 routes transactions between nodes, tracks requests, and broadcasts snoops. Cache controller switch 30 includes a snoop filter 34. Snoop filter 34 tracks the state and location of cache lines held in the processor caches and I/O hub caches. A benefit of the snoop filter is to eliminate the need to broadcast unneeded snoop requests to all caching agents, thus reducing latency of coherent memory accesses, decreasing bandwidth utilization, and improving system performance. If an access is made that is a miss in snoop filter 34, a memory read is issued to the local memory controller hub, and a location in snoop filter 34 is allocated to track the cache line. It is safe to fetch data from memory without snooping the processor bus.
Due to the finite number of entries, a miss in snoop filter 34 may indicate there are no available entries. In such a case, a victim entry will be selected for eviction/back invalidation. A drawback of snoop filter 34 is that it must be sized to match the cumulative size of all the caches in the system to be effective. If the snoop filter is not sized appropriately then the processor caches will receive an excessive number of back invalidates due to frequent replacements in the snoop filter. This will limit the cache utilization of the processors resulting in the system under performing.
Snoop filter 34 may include multiple snoop filters that are physically different. For example, one snoop filter could be for even cache lines and another could be for odd cache lines. The multiple snoop filters do not have to be in a centrally located snoop filter, but rather may be distributed (e.g., in memory controller hubs and/or in memory interfaces integrated with the processor). In a uniform memory access (UMA) system, all memory locations have an essentially equal access time for each processor.
In a non-uniform memory access (NUMA) system, memory locations (addresses of cache lines) are shared by the processors, but some memory locations are accessed more quickly by some processors than by others. For example, in FIG. 1, processors in node 0 can access locations in main memory 16 more quickly than processors in node 0 can access locations in main memory 26. Further, a particular range of memory locations may be assigned to node 0 and another range may be assigned to node 1. The programmer of the operating system (OS) or other programs may take advantage of this locality by having processors in node 0 tend to use the memory location in the range associated with node 0 and processors in node 1 tend to use the memory locations in the range associated with node 1.