Modern computer systems are realized by the interconnection of various components including processors, memory devices, peripheral devices and so forth. To enable communication between these different components, various links may be present to interconnect one or more of the devices together. Systems can include many different types of interconnects or links. Typically, there is a given communication protocol for each particular type of link, and communications occurring on such link are according to this protocol. In many systems, links may include coherent links and non-coherent links. A coherent link is typically used for tightly coupled components, where the corresponding protocol provides for coherent transactions such that a consistent view of data that may be cached in various locations can be maintained. In contrast, in a non-coherent communication protocol, communications may not occur in a cache coherent manner
Atomic operations enable synchronization mechanisms that can be useful in situations with multiple producers and/or consumers that are to be synchronized in a non-blocking fashion. Atomic operations also enable lock-free statistics counters, for example, where a device atomically increments a counter, and host software atomically reads and clears the counter. Compared to locking transactions, atomic operations can provide lower latency and higher scalability than other interconnect traffic.
In the Peripheral Component Interconnect Express (PCIe™) protocol, atomic operations were first introduced as an Engineering Change Notice entitled “Atomic Operations” dated Jan. 15, 2008 (ECN). While the benefits of atomic operations (also referred to herein as “atomics”) are highly dependent on the application and usage models, it is expected that accelerators, high-end graphics and high performance computing (HPC) would benefit from platform support for atomics.
In general, atomic operations according to the PCIe™ protocol provide for a single transaction to target a location in memory space, read the location's value, potentially write a new value to the location, and return the original value. This read modify and write sequence to the location is performed atomically while at a lower latency than locking operations. In many instances performing an atomic in a complex system may cause a very large and non-deterministic latency to occur as the completion of the atomic operation may require a number of remote memory transactions of unknown latency.