Many distributed computing systems, including high-performance computing systems, communicate by passing messages between networked computing nodes. Typically, the processor cores of a target node must determine that messages have arrived. For systems that send a high rate of small messages, per-message overhead may reduce message processing rates. To reduce per-message overhead, typical messaging solutions may count events (e.g., the number of messages sent or acknowledgements received). Certain systems may count events using networking hardware, which may reduce or eliminate per-message software overhead. For example, certain systems may support hardware event counters. An event counter may be notionally stored in memory, but in practice is cacheable so that the counter may be updated quickly by the target node without a full round-trip to memory.
Processor cores may poll the value of the counter to determine when the counter value has changed. However, polling on the event counter by the processor core may cause the cache line of the event counter to be invalidated (or downgraded to a shared state) every time the processor attempts to read the event counter. When the next message arrives, the counter increment may be delayed while the networking hardware re-acquires the cache line of the event counter in a writable state. Thus, in some systems, software polling on the event counter may cause the cache line to “bounce” between networking hardware and the processor cache, which may reduce message processing rate. Certain systems may reduce the processor core polling rate so that the cache line stays writable in the networking hardware cache a longer proportion of time. Reducing the polling rate may allow the networking hardware to accept messages at a high rate, but may require a relatively long delay between each core polling event.