Systems with multiple processors typically must share resources, especially memory. A main memory may be shared among several processors by transferring portions of the main memory to caches near the processors. However, sometimes more than one processor may request the same line in main memory. Keeping the shared memory line coherent may require tracking which processor caches have a copy of the cache line at any time. As the line is written to, other copies of the old line must be updated or invalidated.
FIG. 1 shows three caches using a snoop bus for coherency. Snoop bus 16 connects to caches 10, 12, 14 that can cache data for three processors. When cache 10 has a read miss, new data is read from a shared main memory (not shown) and loaded into cache 10. The address of the new data is sent by cache 10 onto snoop bus 16 as the snoop address. Other caches 12, 14 examine the snoop address on snoop bus 16 to determine if any lines in caches 12, 14 are for the same address. Cache 12 has no matching cache lines and does no further processing. However cache 14 has a cache line that matches the snoop address. Cache 14 invalidates its cache line matching the snoop address so that cache 10 has the only cached copy of the cache line.
While such a snoop bus has been useful in the prior art, scalability is a problem. When the number of caches connected to snoop bus 16 increases to 8, 16, 32, 64, or more, snoop bus 16 gets overloaded with snoop addresses sent by the many caches.
FIG. 2 shows a distributed snoop directory. Rather than have each cache monitor snoop bus 16 and compare each snoop address to its own cache tags, snoop requests are sent over snoop bus 16 to a central directory of snoop tags. Snoop tag directory 18 receives snoop requests from caches 10, 12, 14 when cache misses occur. Snoop requests are sent from caches 10, 12, 14 over snoop bus 16 to snoop tag directory 18. Snoop tag directory 18 has a duplicate set of cache tags that indicate which of caches 10, 12, 14 has a copy of the range of memory addresses stored in the corresponding cache line.
When another cache has a copy of a requested cache line, an invalidate command may be sent from snoop tag directory 18 to the other cache having the copy of the cache line. The other cache invalidates the cache line to maintain coherency.
For example, cache 10 has a cache miss and sends a request to snoop tag directory 18 with snoop address A. Snoop tag directory 18 looks up snoop address A in its set of snoop tags and finds that cache 14 has a copy of this same cache line. Snoop tag directory 18 sends an invalidate command over snoop bus 16 to cache 14, which invalidates the cache line. Then cache 10 can safely load the cache line and have the only valid cached copy of that line.
Rather than have a single snoop tag directory 18, multiple snoop tag directories 18, 19 may be used. Each snoop tag directory 18, 19 covers a different range of memory addresses. The snoop address determines which of snoop tag directory 18, 19 the request is routed to over snoop bus 16. See the co-pending application for “Duplicate Snoop Tags Partitioned Across Multiple Processor/Cache Chips in a Multi-Processor System”, U.S. Ser. No. 10/711,387, filed Sep. 15, 2004.
Although dividing snoop tags and processing over several snoop tag directories 18, 19 reduces snoop processing load on any one snoop tag directory 18, 19, most of the tasks for snoop processing is still performed by snoop tag directory 18, 19. Snoop tag directory 18, 19 must order requests from different caches into the correct sequence, and must ensure that other caches invalidate lines or forward dirty data to the new cache before the new cache can operate on the requested cache line. Bottlenecks can occur as requests, messages and acknowledgements from the many caches funnel to snoop tag directory 18 or snoop tag directory 19.
What is desired is a multi-processor, multi-cache system with distributed cache coherency processing.