Bottlenecks may exist in conventional multithreaded programs. The bottlenecks may arise due to serialization of parallel threads for critical sections. Threads may become serialized due to locks that programmers use to protect data. However, the locking convention may introduce errors or produce deadlock scenarios. In some cases, serialization may not be necessary if no data collision actually occurs within the critical section. Therefore, it is possible for some critical sections that do not have inter-thread dependencies to execute concurrently without locks. Unfortunately, conventional processors lack mechanisms to dynamically ignore false inter-thread dependencies.
Previous attempts at speculative lock elision have been made. However, these were attempts at general purpose elision where the critical section (CS) could be of any length. Therefore, a speculative versioning cache and register checkpoint were employed to allow for an arbitrarily long CS. However, analysis of CS length reveals that many CS consume fewer instructions than are available in a reorder buffer (ROB) and fewer cycles than are incurred in cache operations associated with acquiring a lock. In particular, ping-ponging addresses back and forth between caches may consume more processor cycles than exist in an entire CS.