The present invention pertains to snoop filters used in multi-processor systems. More particularly, the present invention relates to a technique for replacing cache line entries in a snoop filter so as to reduce back invalidates in a multi-node architecture.
A snoop filter is a device used to reduce bus traffic in certain computer systems, particularly multiple-processor (xe2x80x9cmulti-processorxe2x80x9d) systems. In a multi-processor system, the snoop filter generally forms part of an interface between two or more xe2x80x9cnodesxe2x80x9d. Each node contains one or more processors, a node controller, and memory, including one or more levels of cache memory associated with each processor. The snoop filter is essentially a specialized cache for tracking cache coherency state information relating to the cache memories of the processors. The snoop filter keeps track of the coherency state of each cache line of each of the processors. The state information is used by the snoop filter to decide which bus transactions received from the various nodes need to be passed on to other nodes in the system. The snoop filter filters unnecessary bus transactions by preventing them from reaching those nodes for which they are not needed. Hence, a snoop filter can have a dramatic positive impact on the overall system performance by reducing bus traffic.
FIG. 1 shows an example of a four-node, eight-processor system 1. The four nodes 2 (xe2x80x9cnode#0xe2x80x9d, xe2x80x9cnode#1xe2x80x9d, xe2x80x9cnode#2xe2x80x9d, and xe2x80x9cnode#3xe2x80x9d) are coupled to each other through a multi-node interface 10, which includes a snoop filter 5 and a switch 9. The switch 9 controls the routing of communications traffic between nodes 2. Each node 2 includes two processors 3 coupled on a local bus 8. The two processors 3 of each node 2 are also coupled to a Random Access Memory (RAM) 4 of that node through a node controller 6. The node controller 6 of each node 2 is coupled to the snoop filter 5. In addition, associated with each processor 3 is a cache memory (or xe2x80x9ccachexe2x80x9d) 7. The caches 7 may be located within their respective processors 3, or they may be separate from but coupled to their respective processors 3 (e.g., off-chip or outside the processor but on the same chip).
Now consider a simple example of how the snoop filter 5 conserves bus bandwidth for the four-node system. Assume that a particular cache line, address A, is present only in one node of the system, node#0. If a processor 3 in node#3 wants to write to this cache line, the request first comes to the snoop filter 5, and the snoop filter 5 will send an invalidation request only to node#0, since that cache line is only resident in node#0. The request is xe2x80x9cfilteredxe2x80x9d, i.e., not forwarded to the two remaining nodes, node#1 or node#2, eliminating unnecessary transactions on the local buses 8 of node#1 and node#2.
Because the cache lines of each processor 3 have an entry in the snoop filter 5, the snoop filter 5 will send a xe2x80x9cback invalidatexe2x80x9d when it is forced to replace a snoop filter entry that corresponds to a valid entry in some processor""s cache 7. A back invalidate is simply a signal to one or more nodes 2 to invalidate, in their caches 7, the line that has been replaced in the snoop filter. Continuing the example above, if the entry corresponding to cache line address A is replaced, a back invalidate signal is sent to node#3 before the snoop filter entry is allocated to a new cache line. When a node 2 receives a back invalidate signal, the node 2 marks that cache line as invalid. If another access is made to the cache line that was invalidated, a miss will occur and the accessing processor 3 will be forced to send a bus request to re-read the line.
In this way, back invalidates from a snoop filter increase bus traffic and can cause cache misses. Known snoop filter replacement methods do not address this problem adequately. A conventional method for choosing a replacement line in a snoop filter is to use a temporal-based replacement algorithm such as Least Recently Used (LRU). In LRU, the oldest entry is chosen for replacement based on the premise that if the line has not been accessed recently, it is unlikely to be accessed in the near future. Other temporal-based algorithms, such as Pseudo-LRU (PRLU) or First In First Out (FIFO), work in a similar manner but are less expensive to implement. However, known temporal-based algorithms suffer from the same limitation, i.e., the lack of temporal information available to the snoop filter. This lack of temporal information stems from the fact that the snoop filter is only updated on cache misses. The snoop filter is unaware of cache hits and thus receives only a fraction of the temporal information available in the cache. Perhaps more importantly, the temporal correlation between the access streams from different processors in a multi-node architecture is weak at best.