One of the performance benchmarks of computer systems is process response time which depends, in large part, on the manner in which the process handles resources. In a computer system, a process includes two components, a thread and one or more resources. A thread is a dynamic object representing a control point in the process and executing a sequence of instructions. Multithreaded systems allow more than one thread in each process, and the threads typically have shared resources, i.e., a common address space or shared variables. The threads are instantiated as system-level threads or application level, also known as user-level, threads. The system-level threads schedule an underlying user-level process when an application is executing that, in turn, utilizes library functions to schedule its threads. Hence, the implementation of user-level threads includes the use of libraries that provide the functionality for creating, synchronizing, scheduling and managing threads.
A stack is a LIFO (Last-in-First-Out) data structure, and is often implemented as a linked-list (hence the term linked stack). A LIFO is a kind of data structure which only allows element insertion and removal and where the last element inserted is the first element removed. A basic linked list is one in which a node includes an element and a pointer to the next element. A LIFO Linked list is a linked list data structure that allows element insertion and removal at the front of the list where the last element inserted is the first element removed. An element X1 is inserted by creating the first node (X1, next), and a utility entry pointer is attached to the new node and then the contents of the utility entry pointer is assigned to the entry pointer. As shown in FIG. 1, the entry pointer 12 points to the beginning of the linked list (and the first element 14a), the next pointer 14b points to the second element 16b, and the next pointer in the last node 22b is null.
In a multithreaded system, each thread has its own private objects as well as program counter and its own local state information. Multithreading allows concurrency, although if any shared resource can be accessed concurrently by multiple threads, such access must be synchronized. Accordingly, as the aforementioned libraries provide synchronization objects, applications can provide concurrency of user-level threads by scheduling and managing the user threads through these libraries. When multiple threads run concurrently, the synchronization objects protect shared resources by employing a blocking technique. And so the synchronization objects include a lock variable and often a queue of blocked threads (the threads being blocked while control is with another non-blocked thread). The blocking scheme isolates or locks part or all of the resource (e.g., data structure) to prevent interference from other threads. However, a deadlock may occur if a thread fails or is halted.
Consequently, non-blocking techniques are often used to outperform conventional blocking techniques. With non-blocking techniques the resource is always accessible to the threads, guaranteeing that at least one of the threads will complete its operations in a finite number of steps whether or not other threads have failed or are halted. Generally, non-blocking techniques require a universal atomic primitive such as ‘compare-and-swap’ that supports, for example, insertion and removal of elements from a data structure. Atomicity implies that concurrent threads are protected from accessing preliminary data and any change made during the operation is revoked if anything goes wrong with the operation; although the operation can be retried or it returns an error code. Atomic operations either return a specified result or no result at all but will not change any data structure or parameter in an unpredictable way, and in a multithreaded environment atomic operations will not return a partially updated or intermediate value.
The ‘Compare-and-swap’ operation is a synchronization primitive that can resolve in a wait-free fashion a finite number of contending threads. Because of this property compare-and-swap is used to implement wait-free schemes that do not use locks; and it is often used for synchronization and memory updates by concurrent threads. Compare-and-swap, also known as ‘CAS’, is a three-operand atomic instruction of the form CAS (S, O, C), where O, C, and S are word variables (or possibly other variable types). S is the shared variable, O is usually a private copy of the ‘old’ value of S, which is made sometime earlier by the thread, and C is the new value to which the value of S attempts to change via the CAS operation (i.e., updating S by replacing S′ old value with C). The operation is allowed to do so only if S still (or again) has value O. If the attempt succeeds the operation returns a Boolean value ‘TRUE.’ If the attempt fails, the operation returns the Boolean value FALSE. The CAS operation can be outlined as follows:
old = shared;Boolean CAS (shared, old, new)  if (shared = =old)    shared = new;    return TRUE;  else    return FALSE;  end.
Notably, the update of S (shared) may be inconsistent as a result of the ‘A-B-A’ problem. The inconsistent update of shared occurs if in the time between making the private copy and the attempt to update shared, intervening threads modify shared to another value and then modify it back to the old value. Assume for example that shared's old value is A and the CAS operation is intended to update it to C. Then, if during the time between making the private copy of A (old) and the attempt to update share from A to C, other threads modify share from A to B and back to A, the comparison (shared==A) will succeed and the CAS operation will allow the update even though there was an intervening change to the data structure, and other information examined since share was copied may have changed (hence the term ‘A-B-A’ problem).