This disclosure relates generally to execution of instructions by a computer, and more specifically to execution of instructions in a transactional execution environment.
The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware Transactional Memory (HTM, or in this discussion, simply TM) have been introduced, wherein a group of instructions—called a transaction—operate in an atomic manner on a data structure in memory, as viewed by other central processing units (CPUs) and the I/O subsystem (atomic operation is also known as “block concurrent” or “serialized” in other literature). The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with another operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (TM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
According to U.S. Pat. No. 8,078,806 B2, titled “Microprocessor With Improved Data Stream Prefetching”, by Diefendorff et al., filed on Oct. 25, 2010, incorporated herein by reference in its entirety, a microprocessor coupled to a system memory by a bus includes an instruction decode unit that decodes an instruction that specifies a data stream in the system memory and a stream prefetch priority. The microprocessor also includes a load/store unit that generates load/store requests to transfer data between the system memory and the microprocessor. The microprocessor also includes a stream prefetch unit that generates a plurality of prefetch requests to prefetch the data stream from the system memory into the microprocessor. The prefetch requests specify the stream prefetch priority. The microprocessor also includes a bus interface unit (BIU) that generates transaction requests on the bus to transfer data between the system memory and the microprocessor in response to the load/store requests and the prefetch requests. The BIU prioritizes the bus transaction requests for the prefetch requests relative to the bus transaction requests for the load/store requests based on the stream prefetch priority.
US Patent Publication No. 2005/0071542 A1, titled “Prefetch Mechanism For Use In A System Including A Host Connected To A Plurality of Memory Modules Via Serial Memory Interconnect”, by Weber, et al., filed May 10, 2004, incorporated herein by reference in its entirety, discloses a system including a host coupled to a serially connected chain of memory modules. In one embodiment, the host includes a memory controller that may be configured to issue a memory read request for data stored within the memory modules. The memory controller may further request that data be prefetched from the memory modules by encoding prefetch information within the memory read request. The memory controller may also be configured to issue a memory write request to write data to the memory modules and to selectively request that one or more pages of memory within a given one of the memory modules remain open by encoding the prefetch information within the memory write request.