Field of the Disclosure
This disclosure relates generally to synchronization mechanisms for use in concurrent programming, and more particularly to systems and methods for implementing adaptive lock elision.
Description of the Related Art
Over the past decade, the focus of the computing industry has shifted from making faster computing cores to building systems with more cores per processor chip and more processor chips per system. To continue to benefit from advances in technology, therefore, applications must be able to exploit increasing numbers of cores concurrently. Constructing scalable applications that can do so is increasingly challenging as the number of cores grows, and this is exacerbated by other issues, such as the increasing latency gap between “local” and “remote” resources such as caches and memory.
In response to these growing challenges, researchers have sought techniques to support effective development of scalable concurrent programs, including techniques for improving lock scalability. While a number of such techniques have proved useful in some contexts, they are typically well-suited to particular use cases and workloads, but are ineffective in improving scalability (or even harm scalability) for other use cases. Furthermore, some techniques (e.g., transactional lock elision) depend on specific hardware support that is not available on all target platforms, while others use software techniques that may be difficult or impossible to apply to some problems.
Exacerbating the problem further, software is often required to target a variety of hardware platforms, system sizes and system configurations. Developers who hope to select and tune the best strategies and mechanisms for each context often find that these choices depend on the workload, which can change over time even for a given application on a given platform. It is difficult and expensive to build and maintain variants that are optimized for a variety of environments and workloads. Therefore, developers are often forced to optimize for a small number of “most important” configurations, at the cost of significantly degraded performance in other contexts.
Transactional Memory (TM) is a promising concurrency control technology that aids programmers writing parallel programs to perform correct data sharing between concurrent computations (which commonly manifest as “threads”). Transactional memory is widely considered to be the most promising avenue for addressing issues encountered in concurrent programming and execution. Using transactional memory, programmers may specify what should be done atomically, rather than how this atomicity should be achieved. The transactional memory implementation may then be responsible for guaranteeing the atomicity, largely relieving programmers of the complexity, tradeoffs, and software engineering problems typically associated with concurrent programming and execution. In general, transactional memory may be implemented in hardware, with the hardware transactional memory (HTM) directly ensuring that a transaction is atomic, or as software transactional memory (STM) that provides the “illusion” that a transaction is atomic, even though in fact it is executed in smaller atomic steps by underlying hardware. HTM solutions are generally faster than STM ones, but so-called “best-effort” HTM implementations may not be guaranteed to be able to commit any particular transaction. Recently developed Hybrid Transactional Memory (HyTM) implementations may allow transactions to be executed using hardware transactional memory if it is available (and when it is effective), or using software transactional memory otherwise.