This disclosure relates generally to execution of instructions by a computer, and more specifically to execution of instructions in a transactional execution environment.
The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware Transactional Memory (HTM, or in this discussion, simply TM) have been introduced, wherein a group of instructions—called a transaction—operate in an atomic manner on a data structure in memory, as viewed by other central processing units (CPUs) and the I/O subsystem (atomic operation is also known as “block concurrent” or “serialized” in other literature). The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with another operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (TM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
According to U.S. Patent Application Publication No. 2012/0233411 A1, titled “Protecting large objects within an advanced synchronization facility”, filed Mar. 7, 2011, by Pohlack et al., incorporated herein by reference in its entirety, a system and method are disclosed for allowing protection of larger areas than memory lines by monitoring accessed and dirty bits in page tables. More specifically, in some embodiments, a second associative structure with a different granularity is provided to filter out a large percentage of false positives. By providing the associative structure with sufficient size, the structure exactly specifies a region in which conflicting cache lines lie. If entries within this region are evicted from the structure, enabling the tracking for the entire index filters out a substantial number of false positives (depending on a granularity and a number of indices present). In some embodiments, this associative structure is similar to a translation look aside buffer (TLB) with 4 k, 2M entries.
According to U.S. Patent Application Publication No. 2012/0079215 A1, titled “Performing mode switching in an unbounded transactional memory (UTM) system”, filed Nov. 30, 2011, incorporated herein by reference in its entirety, a method for selecting a first transaction execution mode to begin a first transaction in an unbounded transactional memory (UTM) system having a plurality of transaction execution modes is disclosed. These transaction execution modes include hardware modes to execute within a cache memory of a processor, a hardware assisted mode to execute using transactional hardware of the processor and a software buffer, and a software transactional memory (STM) mode to execute without the transactional hardware. The first transaction execution mode can be selected to be a highest p of the hardware modes if no pending transaction is executing in the STM mode, otherwise a lower performance mode can be selected.