1. Field of the Invention
The present invention relates to computer memory systems. More specifically, the present invention relates to a method and an apparatus for facilitating concurrent non-transactional execution in a transactional memory system.
2. Related Art
Computer system designers are presently developing mechanisms to support multi-threading within the latest generation of Chip-Multiprocessors (CMPs) as well as more traditional Shared Memory Multiprocessors (SMPs). With proper hardware support, multi-threading can dramatically increase the performance of numerous applications. However, as microprocessor performance continues to increase, the time spent synchronizing between threads (processes) is becoming a large fraction of overall execution time. In fact, as multi-threaded applications begin to use even more threads, this synchronization overhead becomes the dominant factor in limiting application performance.
From a programmer's perspective, synchronization is generally accomplished through the use of locks. A lock is typically acquired before a thread enters a critical section of code, and is released after the thread exits the critical section. If another thread wants to enter a critical section protected by the same lock, it must acquire the same lock. If it is unable to acquire the lock because a preceding thread has grabbed the lock, the thread must wait until the preceding thread releases the lock. (Note that a lock can be implemented in a number of ways, such as through atomic operations or semaphores.)
Unfortunately, the process of acquiring a lock and the process of releasing a lock are very time-consuming in modern microprocessors. They involve atomic operations, which typically flush the load buffer and store buffer, and can consequently require hundreds, if not thousands, of processor cycles to complete.
One technique to reduce the overhead involved in manipulating locks is to “transactionally” execute a critical section, wherein changes made during the transactional execution are not committed to the architectural state of the processor until the transactional execution completes without encountering an interfering data access from another thread. This technique is described in U.S. Pat. No. 6,862,664, entitled, “Method and Apparatus for Avoiding Locks by Speculatively Executing Critical Sections,” by inventors Shailender Chaudhry, Marc Tremblay and Quinn A. Jacobson, issued on 1 Mar. 2005.
Proposed transactional memory systems typically hold in-progress transactional state in a “transaction buffer” alongside a normal level-one (L1) cache. During transactional execution, memory operations which are directed to the L1 cache are intercepted by the transaction buffer. The transaction buffer holds this information until the transaction is committed.
When the transaction is committed, values in the transaction buffer are committed as a group to the cache. This means that if the transaction commits, all involved memory locations are updated. However, if the transaction aborts, all involved memory locations are not updated and hence retain their original value.
Unfortunately, performance problems can arise when a remote process attempts to access one of the memory locations related to the transaction, particularly when the remote process attempts to store new information in one of the memory locations. In response to the attempted access, the transaction is aborted and must be restarted from the beginning. Alternatively, the remote store may be rejected or stalled until the completion of the transaction. Either response can cause performance problems because the work that was accomplished between the start of the transaction and the abort is lost and must be repeated, or else progress of the remote thread is impeded.
Hence, what is needed is a method and an apparatus to facilitate concurrent non-transactional execution in a transactional memory system without the problems described above.