An atomic memory operation is a memory access operation during which a processor core reads a location, modifies it, and writes it back in what appears to other cores as a single bus operation. Atomic memory operations typically are performed inside a processing core. However, some processors or systems may support the execution of atomics outside of a core, in which case they can be referred to as remote atomic operations (RAOs). RAOs are useful in a diverse set of applications, including packet processing, high-performance computing, machine learning, and, more generically, in dynamic scheduling algorithms, to name a few.
Posted RAOs, also called fire-and-forget atomics, are a class of RAO instructions that return no architectural information to software; they instruct the hardware to perform an atomic read-modify-write operation, but do not use a return result into a register. Posted RAO instructions are weakly ordered, to allow the core to offload the operations (e.g., to cache control circuitry) and continue execution.
Unfortunately, execution of RAOs, posted or otherwise, can suffer inefficiencies, especially when a single thread executes multiple RAOs in quick succession to the same cache line. The multiple RAO instructions may be serialized, forcing each one to finish and receive acknowledgement of completion before the next in the sequence can begin execution.