1. Field of the Disclosure
This disclosure relates generally to concurrent access to shared objects, and more particularly to a system and method for implementing a transactional memory that exploits locality to improve transactional memory performance.
2. Description of the Related Art
The multi-core revolution currently in progress is making it increasingly important for applications to exploit concurrent execution in order to take advantage of advances in technology. Shared-memory systems allow multiple threads to access and operate on the same memory locations. To maintain consistency, threads must often execute a series of instructions as one atomic block, or critical section. In these cases, care must be taken to ensure that other threads do not observe memory values from a partial execution of such a block. Such assurances are important for practical and productive software development because without them, it can be extremely difficult to manage the interactions of concurrent threads. Traditional constructs, such as mutual exclusion and locks may be used by a thread to ensure correctness by excluding all other threads from concurrent access to a critical section. For example, no thread may enter a critical section without holding the section's lock. While it does, all other threads wishing to execute the critical section must await the lock's release and acquire it before proceeding.
The pitfalls of these constructs are numerous and well known. They include dead-lock, priority inversions, software complexity, and performance limitations. Locking large sections of code and/or code that accesses a lot of data is a heavy-handed approach to concurrency control. A fine-grain locking approach can be more scalable than a coarse-grain approach, but significantly increases programming complexity because the programmer has to acquire and release the correct locks for the correct data, while avoiding deadlocks, composing critical sections for operations at a higher level of abstraction, etc.
Alternatively, it may be possible to increase parallelism by allowing multiple threads to execute a critical section at one time if the executions do not rely on overlapping memory locations. This may increase performance and mitigate many of the pitfalls normally associated with traditional locking mechanisms. However, it may be difficult (if not impossible) and cumbersome to generate code such that interleaved executions are guaranteed to be correct, i.e. that critical sections do not access memory locations in common.
Transactional memory is a mechanism that can be leveraged to enable concurrent and correct execution of a critical section by multiple threads. As typically defined, a transactional memory interface allows a programmer to designate certain sequences of operations as “atomic blocks” and “transactions,” which are guaranteed by the transactional memory implementation to either take effect atomically and in their entirety (in which case they are said to succeed, or to be aborted, such that they have no externally visible effect (in which case they are said to fail). Thus, with transactional memory, it may be possible in many cases to complete multiple operations with no possibility of another thread observing partial results, even without holding any locks. The transactional memory paradigm can significantly simplify the design of concurrent programs. In general, transactional memory can be implemented in hardware (HTM), in software (STM), or in any of a variety of hardware-assisted software implementations or other hybrid hardware-software transactional memories (HyTM).
To guarantee atomicity, an STM runtime typically mediates its transactions' shared memory accesses through specialized transactional read/write fences. However, these read/write fences are expensive and introduce significant latencies in the shared memory accesses of STM transactions. Furthermore, these fences can significantly increase cache pressure by accessing special transactional metadata, which may not be co-located with the data objects that are the true target of the transactions' accesses. Such excessive cache pressure is detrimental to the performance of applications that are highly sensitive to program locality. For example, some current transactional memory systems employ ownership records that must be acquired and/or updated in conjunction with accessing shared memory locations. One recent proposal for reducing latency associated with updating transactional metadata (e.g., ownership records) relies on a hierarchical clustering of ownership records. In that proposal, a shared memory space is initially partitioned coarsely into a small number of memory areas, each associated with an ownership record, and conflict detection may be performed on these coarse-grained ownership records. If one or more of these initial partitions becomes a source of conflict, it may be fragmented into two or more finer-grained memory areas, each associated with a finer-grained ownership record. Another recent proposal for reducing latency associated with updating transactional metadata involves the augmentation of transactional metadata (e.g., lock records) with forwarding pointers. In that proposal, the runtime attempts to cluster lock records by atomically adding forwarding pointers to the lock records that point to a common cluster head lock. Subsequently, only these cluster head locks need to be acquired and/or included in a transaction's read and/or write set.