Hardware Transactional Memory (HTM) is a mechanism in computer architecture for supporting parallel programming. With HTM, programmers may simply declare a group of instructions as a transaction and the HTM system may then guarantee that the instructions in the transaction are executed in an atomic and isolated way. Atomicity means that all the instructions of the transaction are executed as a single atomic block with respect to all other concurrent threads of execution. Isolation means that no intermediate result of the transaction is exposed to the rest of the system until the transaction completes. HTM systems may allow transactions to run in parallel as long as they do not conflict. Two transactions may conflict when they both access the same memory area and either of the two transactions writes to that memory area.
To support atomicity and isolation, some HTM approaches involve modifying the cache structure to manage transactional data and metadata. For example, in some HTM systems one or more “dirty” bits are added to each cache line to indicate when the data in the cache line has been accessed by an active transaction. For atomicity, cache data that has been modified by a transaction may be buffered in the cache as speculative data values and marked as dirty. If the transaction succeeds, then the speculative data is written to shared memory and if the transaction aborts (e.g., due to conflict), the speculative values are discarded.
For isolation, a cache-coherence protocol may be used to facilitate consistency between the values seen by various concurrent threads and/or processors in the system. Cache coherence messages, also known as probes, may be exchanged between various physical and/or logical processors in response to any of the processors reading and/or writing data to shared memory. In some systems, a processor may detect conflicts by checking whether different types of incoming probes concern transactionally-accessed data buffered in cache.
While the cache-based transaction buffer design described above may be efficient in providing a large transaction buffer at low additional hardware cost, it is very inefficient in providing a minimum guarantee for supported transaction size (i.e., number of different memory addresses accessed by a single transaction). For example, consider a cache-based transaction buffer implemented on a 4-way set-associative cache. If a transaction accesses five different memory bytes, each of which is buffered in a different cache line of the same associativity set, then at least one of the cache lines with transactional data must be evicted from this set. In other words, the cache-based transactional buffer overflows. Thus, the cache-based transaction buffer may fail to support a transaction with a memory footprint of only 5 bytes.
Such shortcomings of cache-based transactional buffers pose significant challenges to application programmers who are forced to design applications in a manner that accommodates a given processor's small minimum guaranteed transaction size.