Memory coherence is a condition in which corresponding memory locations for each processing element in a multiple processor system contain the same cached data. A memory coherence protocol is used to notify all of the processing elements of changes to shared memory values to enable that all copies of the data remain consistent. Memory coherence in symmetric multiprocessing (SMP) systems can be maintained either by a directory-based coherency protocol in which coherence is resolved by reference to one or more memory directories or by a snooping-based coherency protocol in which coherence is resolved by message passing between caching agents. As SMP systems scale to ever-larger n-way systems, snooping coherency protocols become subject to at least two design constraints, namely, a limitation on the depth of queuing structures within the caching agents utilized to track requests and associated coherence messages and a limitation in the communication bandwidth available for message passing.
To address the limitation on the depth of queuing structures within the caching agents, some designs have adopted non-blocking snooping protocols that do not require caching agents to implement message tracking mechanisms, such as message queues. Instead, in non-blocking snooping protocols, caching agents' requests are temporally bounded (meaning snoopers will respond within a fixed time) and are source throttled (to ensure a fair division of available communication bandwidth). For example, the total system bandwidth can be divided evenly (e.g., via time-division multiplexing) amongst all possible processing nodes in the system to ensure the coherency buses have sufficient bandwidth in a worst-case scenario when all processing nodes are issuing requests. However, equal allocation of coherency bus bandwidth in this manner limits the coherency bandwidth available to any particular processing nodes to no more than a predetermined subset of the overall available coherency bandwidth. Furthermore, coherency bandwidth of the system can be under-utilized when only a few processing nodes require high bandwidth.