1. Field of the Invention
The present invention is directed to computer systems. More particularly, it is directed to coordination mechanisms for concurrent programming in computer systems.
2. Description of the Related Art
A traditional goal of multi-processor software design has been to introduce parallelism into software applications by allowing operations to proceed concurrently if they do not conflict with each other when accessing memory. For instance, mutual exclusion locks and monitors represent two traditional concurrent programming synchronization mechanism that may protect shared resources by separating accesses to them in time. As long as a given thread retails a lock on an object or resource, no other thread may modify the object and any other thread attempting to modify the object may be blocked from further execution until the lock is released.
Traditional locking techniques are known to suffer from several limitations however, For example, coarsely grained locking schemes may protect relatively large amounts of data, but may decrease performance due to limited parallelism. Coarsely grained locking schemes are generally not scalable either. Threads may block each other from accessing a single, large, block of memory even if they do not access the same addresses. Conversely, finely grained locked schemes, such as those employing lock-based concurrent data structures, may perform well (e.g., prevent unnecessary blocking), but that performance may be offset by increased programming complexity and increased risk of deadlock.
Concurrent software designs generally try to ensure that threads do not observe partial results of an operation concurrently executed by other threads. Frequently, locks are used to prevent one thread from accessing the data affected by an ongoing operation performed by other threads. Such locking may involve a “balanced” locking scheme that maintains correctness without restricting access to unnecessarily large amounts of unrelated data, such as by using too large of a locking granularity and thereby potentially causing other threads to wait. Preventing deadlock (e.g., causing software to freeze up) is also generally a goal of locking schemes. Furthermore, concurrent software designs frequently have to contend with delays caused by issued unrelated to blocking. For example, a thread holding a lock may be preempted or perform expensive input/output (110) operations, thereby potentially causing an overall reduction in throughput.
Transactional memory (TM) may be considered a paradigm that allows a programmer to design code as if multiple locations can be accessed and/or modified in a single, atomic step (e.g., whether or not they are contiguous in memory). As typically defined, a transactional memory interface may allow a programmer to designate certain sequences of operations as “atomic” blocks, which may be guaranteed by the transactional memory implementation to either take effect atomically and in their entirety (e.g., the succeed) or have no externally visible effect (e.g., they fail). Thus, with transactional memory, it may be possible to complete multiple operations with no possibility of another thread observing the partial results.
In general, in transactional memory implementations the concurrent execution of different atomic blocks by different threads does not appear to be interleaved. To execute an atomic block, according to a transactional paradigm, the underlying system may begin a transaction, execute the atomic block's memory accesses using that transaction and then attempt to commit the transaction (e.g., make the memory modifications visible to other threads). If the transaction commits successfully, the atomic block's execution appears to take effect atomically at the transaction's commit point. If the transaction fails, the atomic block does not seem to take effect at all and memory may appear as if the atomic block had not been executed at all. It is generally considered the responsibility of the TM implementation to guarantee the atomicity of operations executed by transactions.
TM may, in general, be implemented in hardware or software. For instance, TM may be implemented using special hardware support (e.g., by enhancing existing hardware memory management mechanisms to support atomic, programmer-specified, transactions) and/or using software-only techniques (e.g., using extensions to conventional programming languages). A hardware transactional memory (HTM) implementation may directly ensure that a transaction is atomic, while a software transactional memory (STM) implementation may provide the illusion of atomicity (e.g., the transaction appears atomic to other, concurrently executing, threads), even if actually executed in multiple, smaller, atomic steps by the underlying hardware. In other words; single-threaded sequences of concurrent operations may be combined into non-blocking atomic transactions. Executing threads may indicate the transaction boundaries, such as by specifying when a transaction starts and ends, but may not have to acquire locks on any-objects. Transaction memory programming techniques may allow transactions that do-not overlap in data accesses to run uninterrupted in parallel while aborting and/or retrying transactions that do overlap.
While HTM solutions are generally faster than STM solutions, HTM solutions traditionally do not support certain operations or events while executing a transaction (e.g., context switches, interrupts, function/method entry code, etc.). When such an event happens while executing a hardware transaction, the transaction may have to be aborted. Operations that cause hardware transactions to fail or abort may be referred to as Non-Hardware-Transactionable (NHT) operations. Traditionally, systems implement or support only a single type of transactional memory implementation. Moreover, a programmer generally must know about, and write code to support, the particular interfaces for implementing a system's transactional memory. Even if a system supports a particular transactional memory implementation, that implementation may not support, or guarantee to support, all transactions. For example, a system may support a “best-effort” HTM implementation in which most transactions succeed, but that does not guarantee support for all transactions. Thus, programmers must frequently also include a more flexible, if slower, implementation that guarantees support for all transactions. The programmer may have to specifically write code to support both the faster best-effort implementation and the slower, fallback, implementation at every location in the application for which the programmer wishes to execute instructions atomically.
Since STM implementations may involve multiple, smaller sets of operations combined into a transaction, STM implementations generally need to validate that the set of memory locations involved in a transaction (e.g., either read or written) is in a coherent state before being committed to the transaction. For example, a STM implementation may verify that none of the memory locations read have been changed by another thread while the STM transaction was executing. If another thread has modified a location read by the current STM transaction, committing the current STM transaction may cause memory to become corrupt or otherwise incoherent.