The present disclosure relates generally to a computer implemented method and system for allowing non-cacheable loads in a hardware transactional memory environment. The number of central processing unit (CPU) cores on a chip and the number of CPU cores connected to a shared memory continues to grow significantly to support growing workload capacity demand. The increasing number of CPUs cooperating to process the same workloads puts a significant burden on software scalability; for example, shared queues or data-structures protected by traditional semaphores become hot spots and lead to sub-linear n-way scaling curves. Traditionally this has been countered by implementing finer-grained locking in software, and with lower latency/higher bandwidth interconnects in hardware. Implementing fine-grained locking to improve software scalability can be very complicated and error-prone, and at today's CPU frequencies, the latencies of hardware interconnects are limited by the physical dimension of the chips and systems, and by the speed of light.
Implementations of hardware transactional memory (HTM, or in this discussion, simply TM) have been introduced, wherein a group of instructions—called a transaction—operate in an atomic manner on a data structure in memory, as viewed by other central processing units (CPUs) and the I/O subsystem (atomic operation is also known as block concurrent or serialized in other literature). The transaction executes optimistically without obtaining a lock, but may need to abort and retry the transaction execution if an operation, of the executing transaction, on a memory location conflicts with anther operation on the same memory location. Previously, software transactional memory implementations have been proposed to support software Transactional Memory (STM). However, hardware TM can provide improved performance aspects and ease of use over software TM.
U.S. Pat. No. 6,321,302 titled “Stream read buffer for efficient interface with block oriented devices”, filed Apr. 15, 1998, and incorporated by reference, teaches a system for improving the efficiency of data transactions to a non-cacheable address, or to a block-accessed device. A stream read buffer and associated logic is used to temporarily store the non-cacheable data, or to store large blocks of data from a block-accessed device. The stream read buffer loads the data upon the occurrence of certain predefined events, as determined by the associated state logic. Similarly, the stream read buffer flushes its contents when the stored data is not being accessed, or after the expiration of a particular time frame.
U.S. Pat. No. 7,676,636 titled “Method and apparatus for implementing virtual transactional memory using cache line marking”, filed Jul. 10, 2007, and incorporated by reference, teaches embodiments which implement virtual transactional memory using cache line marking. The system starts by executing a starvation-avoiding transaction for a thread. While executing the starvation-avoiding transaction, the system places starvation-avoiding load-marks on cache lines which are loaded from and places starvation-avoiding store-marks on cache lines which are stored to. Next, while swapping a page out of a memory and to a disk during the starvation-avoiding transaction, the system determines if one or more cache lines in the page have a starvation-avoiding load-mark or a starvation-avoiding store-mark. If so, upon swapping the page into the memory from the disk, the system places a starvation-avoiding load-mark on each cache line that had a starvation-avoiding load-mark and places a starvation-avoiding store-mark on each cache line that had a starvation-avoiding store-mark.