Developers of a program in which parallelism is used often spend great effort in minimizing the amount of code inside a critical region because critical regions are traditionally implemented using locks and are typically points of serialization in the parallel program. Recent advances in hardware to support transactional memory (TM) offers a lock free mechanism for implementing a critical region. That is, threads concurrently and optimistically execute the critical region in parallel, and only where there are conflicts, one thread will survive and others will abort. The use of transactional memory therefore typically provides a capability to parallelize the critical region. In even a worst-case scenario, processing performance typically becomes no worse than serializing the region using a lock.
However, typical results indicate the overhead of entering and exiting a hardware transaction is in the order of 3-4 times the overhead of when a conventional larx/stcx lock is used. The management of context saves and restores of registers further adds to the observed overhead. Therefore, when a naïve developer creates a parallel program and simply replaces the usage of a lock with the usage of TM on critical regions, the developer may often observe either no improvement in processing performance or even a significant degradation in processing performance, even under conflict free situations. This observed behavior may occur because existing critical regions tend to be fairly small and the TM overhead cannot be properly amortized over such small regions.