Advanced computer systems are being developed with a point-to-point (PTP) interconnect technology between processors such as central processing units (CPUs) and between CPUs and other system agents such as an input/output (I/O) hub (IOH) for speed, performance and scalability.
For such systems that implement a source-snooping protocol, a requesting node (e.g., a processor node) that wants to have ownership of a cache line address needs to broadcast a snoop to all nodes in the system and collect all snoop responses before the cache line ownership can be granted to the requesting node. The snoop responses are collected by a so-called home agent that is the owner of the data. For a broadcast snoop, the home agent cannot send the data until all snoop responses are received. Some agents may have a relatively long snoop latency, which has a negative impact on performance, since cache line ownership cannot be decided until all snoops are received, therefore blocking other requests targeting the same cache line and blocking the request from being evicted to make room for a new request.
As the number of caching agents in a platform increases, the snoop latency starts to dominate over memory latency. This becomes the critical path in the load-to-use latency in a source-snooping protocol in the case when none of the peer agents have cached the line (and thus cannot forward the line), because the home agent has to wait until all snoop responses have been received before it knows that the line needs to be obtained from memory. In a non-fully interconnected system, the loaded snoop latencies can get very high because of the sheer number of snoops passing through shared links.