The present invention relates generally to multi-threaded software and, more particularly, to a system for performing an atomic operation by a thread in a multi-threaded binary translation system.
Binary translation is the simulation of one (target) Instruction Set Architecture (ISA) with another (host) ISA. The performing of binary translations (target simulations) can be optionally accompanied with optimization and code instrumentation in which the host and target ISA may be the same or different architectures.
When considering multi-core architectures, sequential target simulation is prohibitively slow, thereby motivating the use of parallel simulation in which multiple threads may be running target ISAs. In this regard the target hardware architecture provides hardware guaranteed atomic instructions for implementing synchronization primitives in a shared memory cache coherent multi-core environment or system. More specifically, when an atomic store instruction is performed on data stored in a shared memory address, any resulting modification to the data stored in the shared memory address appears to have occurred “instantaneously” to the rest of the multi-core system.
One challenge with the parallel simulation of atomic instructions relates to the complexity of parallel access to shared memory locations by multiple contending threads. As a result, blocking algorithms relying on mutual-exclusion (mutex) software primitives are often used for parallel simulation of atomic instructions. However, mutual exclusion software primitives have unnecessarily large performance overhead. As an alternative, non-blocking algorithms have been developed for parallel simulation of atomic instructions. Lock-free and wait free are types of non-blocking algorithms. Lock free algorithms allow for system wide progress, whereas wait-free algorithms ensure per-thread progress. Hence, wait-free non-blocking algorithms are preferable over lock-free, however wait free non-blocking algorithms typically have inherent race conditions and will not work correctly for multi-core target systems.