1. Field of the Invention
Embodiments of the present invention relate to mechanisms that facilitate transactional memory in computer systems. More specifically, embodiments of the present invention relate to techniques for improving commit latency for transactional memory.
2. Related Art
Some computer systems provide a special mode of execution for critical sections of program code. Generally, a critical section is a special section of the program code that is to be protected against interference from other threads or processors in the computer system. For example, while executing a critical section, the computer system may prevent another thread or processor from accessing cache lines that have been accessed by instructions in the critical section. Depending on the computer system, critical sections can range from single instructions to long, complex sequences of instructions.
In some systems, when executing a critical section, cache lines (or cache structures) accessed by instructions within the critical section are locked to protect the cache lines from interfering access by other threads or processors. Unfortunately, locking cache lines can cause system performance to degrade because other threads or processors that need access to the cache lines must stall, waiting until the execution of the critical section has completed and they can gain access.
To avoid stalling the other threads or processors, computer system designers have proposed executing the critical section as a transaction (i.e., “transactional execution”). When executing a transaction, a processor executes a critical section for a thread, but prevents the results from affecting the architectural state of the system until the entire critical section successfully completes. For example, some systems buffer transactional stores in a store buffer and load-mark and store-mark the cache lines loaded and stored by the transaction. When the transaction successfully completes, the processor atomically commits the results of the transaction for the thread to the architectural state of the system.
In systems that support transactional execution, other threads or processors are permitted limited access to the marked cache lines as the transaction is executing. However, if another thread or processor attempts to perform an interfering access to a marked cache line, the transaction may fail or the system may force the other thread or processor to stall until the transaction is completed.
In an exemplary system, when atomically committing the results of the transaction to the architectural state, the processor signals the L2 cache to lock the store-marked cache lines. The processor then individually commits each buffered store operation to the architectural state of the system (i.e., stores the transactional results in the corresponding cache line in the L2 cache), removes the store-mark from the cache line, and the L2 cache unlocks the cache line. When all the buffered stores have been committed, the processor resumes non-transactional execution for the thread. Committing the transactional results in this way preserves the memory atomicity of the transaction.
Unfortunately, if there were no stores buffered during the transaction, the processor signaling the L2 cache, and the L2 cache searching for the (non-existent) store-marks on cache lines can unnecessarily consume memory system bus bandwidth and cause delay.
Hence, what is needed is a system that does not suffer from the above-described problem.