Multiple processors are increasingly being used in computational systems to achieve higher rates of computational performance, for example, by facilitating parallel processing of computational tasks. According to some typical configurations, a multiprocessor chip includes multiple processors coupled to multiple levels of cache, and the chip is coupled to a memory. For example, each of the multiple processors can be coupled to its own level one (L1) caches, the L1 caches can be coupled to multiple level two (L2) caches, and the L2 caches can be coupled to (i.e., share) a single level three (L3) caches. The lowest-level (e.g., L3) cache can be coupled to a memory of the computational system. The caches can be used to improve instruction and/or other data access speeds by allowing the processors to perform memory accesses through a hierarchy of caches (i.e., from highest to lowest level cache). For example, rather than going out to the memory to look for data and/or instructions, a processor can look in its local L1 cache, then (if not present in L1 cache) in L2 cache, and so on.
If data is not present in cache, it may be copied from the memory into cache (e.g., from memory to L3 cache, then to L2 cache, then to L1 cache). However, if the same data is copied redundantly (e.g., into multiple L1 or L2 caches) and subsequently modified, the caches may have different versions of the same data (e.g., a “coherency” issue). One conventional approach for addressing coherency issues is to establish a coherency protocol that detects when cached data is modified and updates or invalidates all other cached copies of the data, accordingly. Such an approach typically involves broadcasting update or invalidation messages across a cache data bus (e.g., between the L1 and L2 caches, and/or between the L2 and L3 caches), which can strain bus resources and degrade performance when such messages are frequent.