The disclosed system relates generally to synchronization mechanisms for concurrently executing computer programs, and more specifically to a system for providing an efficient, platform independent double compare and swap operation.
In contemporary computer systems, there is a desire for software programs to execute using multiple "threads", which each perform a specific sequential set of activities. Such "multi-threading" is for example desirable to take advantage of computer systems having multiple CPU's, by permitting concurrent execution of different threads on different CPU's. Multi-threading to provide concurrent execution using multiple CPU's may be obtained using threads defined either explicitly by a programmer or in an automated fashion by a software tool such as a compiler.
Dividing a program into multiple threads may also be useful where the resulting code is executed on a single processor system. Programmers often find that the problems they are asked to solve involve sets of activities that can be performed independently by decomposing the general problem into multiple threads. Accordingly, multi-threading is also considered to be an important programming tool.
For concurrently executing threads to operate correctly on shared data, a mechanism or protocol must be provided that ensures exclusive access to the shared data for the duration of an indivisible operation. Such an indivisible operation is known as an "atomic" operation which provides "mutual exclusion" with respect to accessing the shared data.
Various types of software locks have been developed to provide mutual exclusion. Existing software locks typically allow a thread wishing to access shared data to request a lock associated with the data. If the thread successfully obtains the lock, it is guaranteed exclusive access to the data until it explicitly releases the lock. No thread may access the shared data unless it holds the associated lock.
Existing software locks generally rely on an instruction of the underlying hardware which indivisibly tests and modifies a "lock word" of memory associated with the shared data. If a thread attempts to obtain a lock for data that is already locked, the lock request may either loop indefinitely testing the lock word ("busy waiting"), or alternatively suspend execution until it detects a signal generated by an "unlock" instruction issued by the thread holding the lock.
When many threads frequently try to access the same data, existing software lock mechanisms may become a bottleneck because only one thread can make progress at a time. Moreover, significant computational resources can be consumed managing a pool of waiting threads.
Some concurrent programs can be written to directly utilize certain indivisible hardware instructions in order to ensure data synchronization without using software locks. This approach to data synchronization is known as "wait-free" synchronization. With this approach the software is advantageously designed so that execution of threads can be arbitrarily interleaved without adversely affecting the computation. However, the effectiveness of this approach depends on the power of the specific indivisible operations provided by the underlying hardware.
An example of an existing hardware implemented instruction that can be used to support synchronization of concurrent threads is the "compare-and-swap" instruction (CAS). The CAS instruction atomically compares the contents of a memory location with a test value and stores a new value into the location if there is a match. Unfortunately, it is awkward to write programs using CAS, because it only updates one location indivisibly. Many useful algorithms require larger indivisible operations than CAS. For example, it is difficult to insert a node in a doubly-linked list using CAS because two pointers must be updated in a single atomic operation.
A few existing hardware processors have implemented a "double-compare-and-swap" operation (DCAS). A DCAS instruction atomically compares the contents of two separate locations against test values, and if both contents match the test values, replaces the contents of the locations with new values. DCAS is more useful that CAS in certain applications because it can indivisibly express many more data structure operations. It is highly valuable for computer programs to be developed or generated such that they are platform independent, in that they may be executed without modification on a variety of hardware platforms. Because the DCAS hardware instruction is not universally implemented, programs which depend on DCAS to provide synchronization must today be modified to execute on systems which only provide CAS. Such modification may be prohibitively expensive to implement. While certain software simulations of DCAS have been proposed, these proposals have typically been based on conventional software locks. Accordingly the power of the DCAS instruction is practically unavailable to existing programs which are required to be platform independent.
For the reasons stated above, it would be desirable to have a system which provides an efficient, platform independent DCAS operation. Such a new system would provide a powerful tool for synchronizing concurrent threads of execution and for improving overall system performance.