Over the years, the number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory have grown significantly to support growing workload capacity demand. For example, the IBM zEC12 enterprise server supports operating system images with up to 101 CPUs. The increasing number of CPUs cooperating to process the same workloads puts significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU's frequency, the latency of hardware interconnects is limited by the physical dimension of the chips and systems, and by the speed of light.
IBM Corporation and Intel Corporation have each recently introduced implementations of hardware Transactional Memory wherein, a group of instructions called a transaction is operating atomically and in isolation (sometimes called “serializability”) on a data structure in memory. The transaction executes optimistically without obtaining a lock, but may need to abort and retry if the operation conflicts with other operations on the same memory locations. Previously, software Transactional Memory implementations have been proposed to support software Transactional Memory (TM). Hardware TM provides far superior performance and ease of use over software TM.
US Patent Application Publication No. 2012/0227045A1 “Method, Apparatus, and System for Speculative Execution Event Counter Checkpointing and Restoring”, filed Feb. 2, 2012, incorporated by reference herein teaches an apparatus, method, and system for providing programmable control of performance/event counters. An event counter is programmable to track different events, as well as to be checkpointed when speculative code regions are encountered. So when a speculative code region is aborted, the event counter is able to be restored to its pre-speculation value. Moreover, the difference between a cumulative event count of committed and uncommitted execution and the committed execution, represents an event count/contribution for uncommitted execution. From information on the uncommitted execution, hardware/software may be tuned to enhance future execution to avoid wasted execution cycles.
U.S. Pat. No. 8,171,262 “Method and apparatus for clearing hazards using jump instructions”, filed Nov. 21, 2005, incorporated by reference herein teaches a method and apparatus for overlaying hazard clearing with a jump instruction within a pipeline microprocessor. The apparatus includes hazard logic to detect when a jump instruction specifies that hazards are to be cleared as part of a jump operation. If hazards are to be cleared, the hazard logic disables branch prediction for the jump instruction, thereby causing the jump instruction to proceed down the pipeline until it is finally resolved, and flushing the pipeline behind the jump instruction. Disabling of branch prediction for the jump instruction effectively clears all execution and/or instruction hazards that preceded the jump instruction. Alternatively, hazard logic causes issue control logic to stall the jump instruction for n-cycles until all hazards are cleared. State tracking logic may be provided to determine whether any instructions are executing in the pipeline that create hazards. If so, hazard logic performs normally. If not, state tracking logic disables the effect of the hazard logic.