This disclosure relates generally to execution of instructions by a computer, and more specifically to execution of instructions in a transactional execution environment.
The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware Transactional Memory (HTM, or in this discussion, simply TM) have been introduced, wherein a group of instructions—called a transaction—operate in an atomic manner on a data structure in memory, as viewed by other central processing units (CPUs) and the I/O subsystem (atomic operation is also known as “block concurrent” or “serialized” in other literature). The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with another operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (TM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
U.S. Patent Publication No. 2013/0024625 A1, titled “Prefetching Tracks Using Multiple Caches”, filed May 24, 2012, incorporated by reference herein in its entirety, includes a computer program product, sequential access storage device, and a method for managing data in a sequential access storage device receiving read requests and write requests from a system with respect to tracks stored in a sequential access storage medium. A prefetch request indicates prefetch tracks in the sequential access storage medium to read from the sequential access storage medium. The accessed prefetch tracks are cached in a non-volatile storage device integrated with the sequential access storage device, wherein the non-volatile storage device is a faster access device than the sequential access storage medium. A read request is received for the prefetch tracks following the caching of the prefetch tracks, wherein the prefetch request is designated to be processed at a lower priority than the read request with respect to the sequential access storage medium. The prefetch tracks are returned from the non-volatile storage device to the read request.
According to U.S. Pat. No. 8,364,902 B2, titled “Microprocessor With Repeat Prefetch Indirect Instruction”, filed Oct. 15, 2009, incorporated herein by reference in its entirety, a microprocessor includes an instruction decoder for decoding a repeat prefetch indirect instruction that includes address operands used to calculate an address of a first entry in a prefetch table having a plurality of entries, each including a prefetch address. The repeat prefetch indirect instruction also includes a count specifying a number of cache lines to be prefetched. The memory address of each of the cache lines is specified by the prefetch address in one of the entries in the prefetch table. A count register, initially loaded with the count specified in the prefetch instruction, stores a remaining count of the cache lines to be prefetched. Control logic fetches the prefetch addresses of the cache lines from the table into the microprocessor and prefetches the cache lines from the system memory into a cache memory of the microprocessor using the count register and the prefetch addresses fetched from the table.