Modern trends in computer architecture have seen a move toward multi-processing, where a single system and/or processor includes multiple processing cores that share memory and are each capable of independent concurrent execution. It is now relatively common to see chip multi-processors (CMPs) with 2, 4, or 8 processing cores on a single chip, or general-purpose graphics processing units (GPGPUs) with many more processing cores. Additionally, the number of processing cores on each chip and/or system is likely to increase even further in the future.
To utilize the increased parallelism capabilities of modern processors, software programmers utilize various synchronization facilities such as instruction set architecture (ISA)-supported atomic instructions. A processing core can execute such instructions atomically with respect to other processing cores in the system, even though the instruction itself contains multiple microinstructions. For example, the atomic instruction CMPXCHG (compare & exchange) in x86 architectures is a general-purpose atomic instruction that instructs a processing core to atomically compare the contents of a given memory location to a given value and, only if the two values are the same, modify the contents of that memory location to a given new value.
ISAs sometimes provide a limited number of specific-purpose atomic instructions, such as atomic XADD, BTS, etc. Where no specific-purpose instructions exist for particular functionality desired by a programmer, the programmer may attempt to construct such logic using general-purpose instructions such as CMPXCHG. However, such constructions can be complex, difficult to implement, and slow to execute.