An atomic operation may involve reading data from a memory location, modifying the data and writing the modified data back to the memory location, without any other accesses being permitted to the memory location in between. If back-to-back atomic operations are requested, subsequent atomic operations may wait for their predecessors to complete in sequence. In certain circumstances, an arithmetic logic unit (ALU) is used to modify the data according to the atomic operation, wherein the ALU is separate from the memory (e.g., bulk storage) housing the data that is being operated on. Thus, for each atomic operation, data may be fetched from the memory, modified by the ALU and returned to the memory once the data modification is complete. These fetches and returns may take a relatively long amount of time (e.g., tens of clock cycles) to complete. Indeed, when several atomic operations line up back-to-back, between every two atomic operations, double the number of clock periods may be spent ferrying the data back and forth between the memory and the ALU. As a result, a relatively long loop of atomic accesses may be experienced, which may in turn degrade performance and/or battery life.