The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally, this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware Transactional Memory (HTM, or in this discussion, simply TM) have been introduced, wherein a group of instructions—called a transaction—operate in an atomic manner on a data structure in memory, as viewed by other central processing units (CPUs) and the I/O subsystem (atomic operation is also known as “block concurrent” or “serialized” in other literature). The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with another operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (TM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
U.S. Patent Publication No. 2007/0028056 titled “Direct-Update Software Transactional Memory” filed Jul. 29, 2005, incorporated herein by reference in its entirety, teaches a transactional memory programming interface that allows a thread to directly and safely access one or more shared memory locations within a transaction while maintaining control structures to manage memory accesses to those same locations by one or more other concurrent threads. Each memory location accessed by the thread is associated with an enlistment record, and each thread maintains a transaction log of its memory accesses. Within a transaction, a read operation is performed directly on the memory location and a write operation is attempted directly on the memory location, as opposed to some intermediate buffer. The thread can detect inconsistencies between the enlistment record of a memory location and the thread's transaction log to determine whether the memory accesses within the transaction are not reliable and the transaction should be re-tried.
U.S. Patent Publication No. 2011/0119452 titled “Hybrid Transactional Memory System (Hybrid™) and Method” filed Nov. 6, 2009, incorporated herein by reference in its entirety, teaches a computer processing system having memory and processing facilities for processing data with a computer program is a Hybrid Transactional Memory multiprocessor system with modules 1 . . . n coupled to a system physical memory array, I/O devices via a high speed interconnection element. A CPU is integrated as in a multi-chip module with microprocessors which contain or are coupled in the CPU module to an assist thread facility, as well as a memory controller, cache controllers, cache memory, and other components which form part of the CPU which connects to the high speed interconnect which functions under the architecture and operating system to interconnect elements of the computer system with physical memory, various I/O devices and the other CPUs of the system. The current hybrid transactional memory elements support for a transaction memory system that has a simple/cost effective hardware design that can deal with limited hardware resources, yet one which has a transactional facility control logic providing for a backup assist thread that can still allow transactions to reference existing libraries and allows programmers to include calls to existing software libraries inside of their transaction, and which will not make a user code use a second lock based solution.