A typical multiprocessor integrated circuit (i.e., chip) utilizes multiple processor cores that are interconnected using an interconnection bus. In general, one or more caches support each processor core, where each cache typically stores data files that are transferred between a main memory and the caches in blocks that have a fixed size and are typically called “cache lines.” In conventional directory-based approaches, each cache includes a directory that contains all the addresses that are associated with the data files cached therein. The data cached at each processor core can be shared among all other processor cores on the interconnection bus. Accordingly, a multiprocessor system can potentially have many copies of the same data, one copy in the main memory, which may be on-chip or off-chip, and one copy in each processor core cache. Moreover, because each processor core can share the data in the local cache with any other processor core on the interconnection bus, a fundamental issue in modern multiprocessor systems is how to ensure that all copies of a given memory location are consistent or coherent as observed by all the processors when any processor(s) desire to update that memory location. In general, the interconnection bus includes hardware mechanisms that are used to handle all the coherency traffic among the various processor cores and caches to maintain cache coherency.
Although other variations are possible, a common method to achieve cache coherency is to have all caches that contain copies of the target memory location stop using the current copy of the target location, which may be achieved through invalidating the cache line that contains the target memory location. Once all cached copies of the target memory location have been invalidated, the processor that desires to update the target memory location is then free to do so. Any other processor that subsequently accesses that memory location will then obtain the updated value, either from the processor that made the update or from the main memory. One mechanism to maintain cache coherency in a multiprocessor system utilizes “snooping,” whereby a processor core that needs a particular cache line first looks in a local cache. If the processor core finds the cache line in the local cache, a cache “hit” has occurred. However, if the processor core does not find the cache line in the local cache, a cache “miss” has occurred, in which case the processor may “snoop” the caches associated with the other processors to determine whether any other caches have the requested cache line. If the requested cache line is located in the cache associated with another processor core, the cache associated with the other processor core can “intervene” to provide the cache line to the requesting processor core such that the requesting processor core does not have to access the data from main memory.
Snooping techniques may generally work well in the event that only two processor cores and associated caches are attached to the interconnection bus. For example, if the first processor core requests a cache line and the cache associated with the second processor core contains the requested cache line, then the cache associated with the second processor core will provide the requested cache line to the first processor core. Otherwise, the cache associated with the first processor core will access the requested cache line from main memory. However, as the interconnection bus supports more and more processor cores that may have the requested data in a local cache, more complex arbitration mechanisms are needed to decide which cache is to provide the requested cache line to the requesting processor core. For example, one arbitration mechanism may include a snoop filter implemented on the interconnection buss, wherein the snoop filter maintains entries that represent the cache lines that all the processor core caches on the interconnection bus own. Accordingly, rather than broadcasting the snoop request to all processor caches on the interconnection bus, the snoop filter may direct the interconnection bus to snoop only the processor caches that could possibly have a copy of the data. Accordingly, in a “snoopy” coherency protocol, when a modifying processor desires to modify a target memory location, the modifying processor may be called the “master” and the other processors may be called “snoopers.” Every other processor that has a coherent cache is notified that the modifying processor intends to modify the target memory location such that the snoopers can take appropriate action upon seeing the request from the master.
Although snoopy protocols generally scale better than directory-based protocols, snoopy protocols nonetheless have a scaling weakness, whereby increases in the number of active processors results in a corresponding increase in the amount of snoop traffic that each active processor receives. Accordingly, snoop filtering generally has an overarching goal to reduce the number of unnecessary snoops as much as possible without introducing area or latency costs or diminishing the ability to filter snoops as time progresses due to accumulated false positives.