1. Field
The present embodiments relate to techniques for improving performance within computer systems. More specifically, the present embodiments relate to a method and system for providing feedback about transactions in a transactional memory system that fail due to misspeculation.
2. Related Art
Computer system designers are presently developing mechanisms to support multi-threading within the latest generation of Chip-Multiprocessors (CMPs) as well as more traditional Shared Memory Multiprocessors (SMPs). With proper hardware support, multi-threading can dramatically increase computational performance. However, as microprocessor performance continues to increase, the time spent synchronizing between threads (processes) is becoming a large fraction of overall execution time. In fact, as multi-threaded applications begin to use even more threads, this synchronization overhead often becomes the dominant factor in limiting application performance.
From a programmer's perspective, synchronization is typically accomplished through the use of locks. A lock is usually acquired before a thread enters a critical section of code, and is released after the thread exits the critical section. If another thread wants to enter a critical section protected by the same lock, it must acquire the same lock. If it is unable to acquire the lock, because a preceding thread has acquired the lock, the thread must wait until the preceding thread releases the lock. (Note that a lock can be implemented in a number of ways, such as through atomic operations or semaphores.)
Unfortunately, the process of acquiring a lock and the process of releasing a lock can be very time-consuming in modern microprocessors. They typically involve atomic operations, which flush load and store buffers, and can consequently require hundreds, if not thousands, of processor cycles to complete.
Moreover, as multi-threaded applications use more threads, more locks are required. For example, if multiple threads need to access a shared data structure, it is often impractical for performance reasons to use a single lock for the entire data structure. Instead, it is preferable to use multiple fine-grained locks to lock small portions of the data structure. This allows multiple threads to operate on different portions of the data structure in parallel. However, it may also require a single thread to acquire and release multiple locks in order to access different portions of the data structure. It also introduces other concerns, such as avoiding deadlock.
To reduce overhead involved in lock-based execution of critical sections, a critical section may be transactionally executed. In particular, changes made during transactional execution of the critical section may not be committed to the architectural state of the processor until the transactional execution successfully completes. Furthermore, the transactional execution may be carried out using “best effort” transactional execution mechanisms that do not prevent architecture-specific limitations of the processor from failing transactions.
Mechanisms for supporting hardware transactional memory have a lot in common with mechanisms used for speculation. For example, a processor may perform a load, which results in a cache miss. Rather than simply waiting for the load to complete, the processor may continue executing subsequent instructions until the result of the load miss is needed. Even if the result of the load miss is needed (e.g., the value from the load is needed to determine the outcome of a branch instruction), the processor may be able to continue execution. Rather than waiting for the load to complete, the processor may assume that the branch is correctly predicted, and continue executing with that assumption.
Unfortunately, failures in transactional and/or speculative execution may be difficult to diagnose. For example, a failed transaction may be caused by a conflicting memory access or an architecture-specific limitation of the processor on which the transaction is executing. In addition, the optimal response to failed transactional and/or speculative execution may depend on the cause(s) of the failed execution. For example, a transaction failure that occurs due to a conflicting memory access or misspeculation may be remedied by retrying the transaction, while a transaction failure caused by an instruction that is not supported by “best effort” transactional execution mechanisms may require bypassing using an alternative code path.
Hence, what is needed is a mechanism for facilitating the diagnosis of and response to failures associated with “best effort” transactional execution and/or speculative execution.