This invention relates to multi-processor systems, and more particularly to snoop tags for caches in a multi-processor system.
One challenge to designing systems with multiple processors is sharing of resources, especially memory. A main memory may be shared among several processors by transferring portions of the main memory to caches near the processors. However, sometimes more than one processor may request the same line in main memory. Keeping the shared memory line coherent may require tracking which processor caches have a copy of the cache line at any time. As the line is written to, other copies of the old line must be updated or invalidated.
FIG. 1 is a diagram of a multi-processor system with a memory-coherency directory. Central processing unit (CPU) processor 20 temporarily stores instructions and data in cache 14, while second CPU 22 temporarily stores its instructions and data in cache 16. Eventually data is written back to main memory 10, which also supplies new instructions and data for processing to caches 14, 16.
Each memory line 18 in main memory 10 can have an entry 28 in memory directory 12. Entry 28 indicates which of caches 14, 16 has a copy of line 18, and whether the line has been written (dirty) and possibly other information. Caches 14, 16 store a tag portion of the address along with the data in cache lines 24, 26. Cache lines 24, 26 can have a different size than memory line 18.
Memory directory 12 is a central location to track the states and locations of copies of all memory lines in the system. All coherency is typically resolved at memory directory 12. As more processors and caches share main memory 10, bottlenecks can increase at this central coherency directory.
The size of memory directory 12 can be quite large. Main memory 10 is usually much larger than caches 14, 16. Memory directory 12 may store one entry for each line in main memory 10, often requiring a huge number of entries for a large memory such as a 4 Giga-byte memory. Directories may have bandwidth requirements that increase proportionally to main memory bandwidth. Since directories are designed with memory controllers this can be synergetic: add more memory bandwidth and obtain more directory bandwidth. Another characteristic is that the directory size is proportional with the memory size. Memory can get really big, so directories also get really big, perhaps prohibiting placing directories on-chip.
FIG. 2 is a multi-processor system with duplicate tags for coherency snooping. Rather than tracking memory line coherency on a per-memory-line basis in memory directory 12, the caches may themselves track cache-line coherency on a per-cache-line basis. When a cache requests to write to a line, it broadcasts the line's address to all other caches in the multi-processor system. The other caches compare the line's address to tags in the local caches and invalidate the cache line if a tag matches.
Since each cache is often busy supplying instructions and data to its local processor, there may be little available bandwidth to the cache tags for comparing broadcast snoops. A second copy of the tags can be kept for each cache to provide access to these broadcast snoops from other caches.
Duplicate tags 30 are a copy of the tags in cache 14 for addresses of valid cache lines in cache 14 used by processor 20. Duplicate tag 34 can contain a copy of the tag address, and other coherency information such as whether the line is dirty, valid, or owned by the local processor cache. Other caches such as second cache 16 also has duplicate tags 32 for cache lines stored in second cache 16.
When processor 20 desires to write to cache line 24 in cache 14, cache 14 broadcasts the tag in a request to all other caches. Second cache 16 receives the request and compares the broadcast address tag to its duplicate tags 32, finding a match with duplicate tag 36. Cache line 26 in second cache 16 is then invalidated to maintain coherency. More complex coherency schemes may also be implemented.
Since each cache maintains its own duplicate tags 30, 32, coherency processing is distributed among caches 14, 16, preventing a bottleneck. A much smaller number of duplicate tag entries are needed than for memory directory 12 (FIG. 1) since caches 14, 16 are much smaller than main memory 10. Snoop tags have a size that is proportional to the size of the caches. Since snoop tags tend to be designed with caches this is synergetic: adding more cache also adds snoop tag capacity. Also, the snoop tag bandwidth is proportional to the memory bandwidth and the number of caches. One failing of traditional snoop tag designs is that communication bandwidth is also proportional to both memory bandwidth and the number of caches. This quickly turns into a bottleneck as systems become larger.
FIG. 3 is a multi-processor system with a central location for duplicate tags for coherency snooping. Rather than having duplicate tags with each cache, a central snoop directory can store all duplicate tags. Cache line 24 in cache 14 has an entry in central duplicate tags 40 that indicates that the cache line is present in cache 14, and whether the line has been written (dirty). Likewise, cache line 26 in second cache 16 has an entry in central duplicate tags 40. When a cache line exists in more than one cache, entry 38 in central duplicate tags 40 can indicate which caches have the line, and which copies have been written.
Since coherency operations are preformed by central duplicate tags 40, caches 14, 16 do not have to be interrupted for comparing snoop addresses to cache tags. However, a bottleneck at central duplicate tags 40 can result, especially when the number of caches and processors is expanded.
What is desired is a multi-processor system that can expand the number of processors and local caches, while still providing cache-line coherency. Cache-line-base rather than memory-line-base coherency is desired to reduce the size of the coherency directory.