The present embodiment relates generally to memory performance, and more specifically to improving memory performance when speculation control is enabled.
The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware Transactional Memory (TM) have been introduced, wherein a group of instructions, called a transaction, operate atomically and in isolation (sometimes called “serializability”) on a data structure in memory. The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with anther operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (TM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
U.S. Patent Application Publication No 2012/01599461 titled “Program Optimizing Apparatus, Program Optimizing Method, And Program Optimizing Article Of Manufacture” filed 2012 Jun. 21 and incorporated by reference herein teaches An apparatus having a transactional memory enabling exclusive control to execute a transaction. The apparatus includes: a first code generating unit configured to interpret a program, and generate first code in which a begin instruction to begin a transaction and an end instruction to commit the transaction are inserted before and after an instruction sequence including multiple instructions to execute designated processing in the program; a second code generating unit configured to generate second code at a predetermined timing by using the multiple instructions according to the designated processing; and a code write unit configured to overwrite the instruction sequence of the first code with the second code or to write the second code to a part of the first code in the transaction.
U.S. Patent 2011/0246725 titled “System and Method for Committing Results of a Software Transaction Using a Hardware Transaction” filed 2011 Oct. 6 and incorporated by reference herein teaches The system and methods described herein may exploit hardware transactional memory to improve the performance of a software or hybrid transactional memory implementation, even when an entire user transaction cannot be executed within a hardware transaction. The user code of an atomic transaction may be executed within a software transaction, which may collect read and write sets and/or other information about the atomic transaction. A single hardware transaction may be used to commit the atomic transaction by validating the transaction's read set and applying the effects of the user code to memory, reducing the overhead associated with commitment of software transactions. Because the hardware transaction code is carefully controlled, it may be less likely to fail to commit. Various remedial actions may be taken before retrying hardware transactions following some failures. If a transaction exceeds the constraints of the hardware, it may be committed by the software transactional memory alone.