Field of the Disclosure
This disclosure relates generally to synchronization mechanisms for use in concurrent programming, and more particularly to systems and methods for implementing transactional lock elision.
Description of the Related Art
Over the past decade, the focus of the computing industry has shifted from making faster computing cores to building systems with more cores per processor chip and more processor chips per system. To continue to benefit from advances in technology, therefore, applications must be able to exploit increasing numbers of cores concurrently. Mutual exclusion locks and monitors represent two traditional concurrent programming synchronization mechanisms. Locks and monitors typically protect shared resources by separating access to them in time. For example, in one implementation, as long as a given thread of execution retains a lock on an object or resource, no other thread of execution may modify the object, and any other thread attempting to modify the object may be blocked from further execution until the lock is released.
However, traditional locking techniques are known to suffer from several limitations. Coarse-grained locks, which protect relatively large amounts of data, often do not scale. For example, threads of execution on a multiprocessor system may block each other even when they do not actually require concurrent access to the same addresses. Fine-grained locks may resolve some of these contention issues, but in traditional locking techniques, this may be achieved only at the cost of added programming complexity and the increased likelihood of other problems, such as deadlocks. Locking schemes may also lead to an increased vulnerability to thread failures and delays—e.g., a thread that is preempted or that performs expensive input/output operations while holding a lock may obstruct other threads for relatively long periods, thereby potentially reducing the overall throughput of the system.
Transactional Memory (TM) is a promising concurrency control technology that aids programmers writing parallel programs to perform correct data sharing between concurrent computations (which commonly manifest as “threads”). Transactional memory is widely considered to be the most promising avenue for addressing issues encountered in concurrent programming and execution. Using transactional memory, programmers may specify what should be done atomically, rather than how this atomicity should be achieved. The transactional memory implementation may then be responsible for guaranteeing the atomicity, largely relieving programmers of the complexity, tradeoffs, and software engineering problems typically associated with concurrent programming and execution. In general, transactional memory may be implemented in hardware, with the hardware transactional memory (HTM) directly ensuring that a transaction is atomic, or as software transactional memory (STM) that provides the “illusion” that a transaction is atomic, even though in fact it may actually be executed in smaller atomic steps by underlying hardware. HTM solutions are generally faster than STM ones, but so-called “best-effort” HTM implementations may not guarantee the ability to commit any particular transaction. Recently developed Hybrid Transactional Memory (HyTM) implementations may allow transactions to be executed using hardware transactional memory if it is available (and when it is effective), or using software transactional memory otherwise.
Traditional Transactional Lock Elision (TLE) generally uses Hardware Transactional Memory (HTM) to execute unmodified critical sections concurrently, even if they are protected by the same lock. To ensure correctness, the transactions used to execute these critical sections “subscribe” to the lock by reading it and checking that it is available. Traditional Transactional Lock Elision may also exploit hardware transactional memory (HTM) to introduce concurrency into sequential code. It achieves this by attempting to execute each critical section protected by a lock in an atomic hardware transaction, reverting to the lock if these attempts fail. A significant drawback of traditional TLE is that it disables hardware speculation once there is a thread running under lock.