This disclosure relates generally to memory and more specifically to a method, computer program and computer system for atomically accumulating memory updates with an addressable accumulator.
The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware Transactional Memory (HTM, or in this discussion, simply TM) have been introduced, wherein a group of instructions—called a transaction—operate in an atomic manner on a data structure in memory, as viewed by other central processing units (CPUs) and the I/O subsystem (atomic operation is also known as “block concurrent” or “serialized” in other literature). The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with another operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (TM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
U.S. Patent Application Publication US20100235587 A1 titled “Staged Software Transactional memory” filed 2009 Mar. 16 and incorporated by reference herein teaches a new form of software transactional memory based on maps for which data goes through three stages. Updates to shared memory are first redirected to a transaction-private map which associates each updated memory location with its transaction-private value. Maps are then added to a shared queue so that multiple versions of memory can be used concurrently by running transactions. Maps are later removed from the queue when the updates they refer to have been applied to the corresponding memory locations. This design offers a very simple semantic where starting a transaction takes a stable snapshot of all transactional objects in memory. It prevents transactions from aborting or seeing inconsistent data in case of conflict. Performance is interesting for long running transactions as no synchronization is needed between a transaction's start and commit, which can themselves be lock free.
U.S. Patent Application Publication US20110225376 A1 titled “Memory Manager for a Network Communications Processor Architecture” filed 2010 Dec. 9 and incorporated by reference herein teaches described embodiments provide a memory manager for a network processor having a plurality of processing modules and a shared memory. The memory manager allocates blocks of the shared memory to requesting ones of the plurality of processing modules. A free block list tracks availability of memory blocks of the shared memory. A reference counter maintains, for each allocated memory block, a reference count indicating a number of access requests to the memory block by ones of the plurality of processing modules. The reference count is located with data at the allocated memory block. For subsequent access requests to a given memory block concurrent with processing of a prior access request to the memory block, a memory access accumulator (i) accumulates an incremental value corresponding to the subsequent access requests, (ii) updates the reference count associated with the memory block, and (iii) updates the memory block with the accumulated result.