1. Field of the Invention
The present invention relates to the field of computers. More specifically, the present invention relates to shared first-in-first-out data structures.
2. Description of the Related Art
A first-in-first-out (FIFO) queue supports an enqueue operation, which places a value at the end of the queue, and a dequeue operation, which removes the first value (if any) from the front of the queue. Concurrent FIFO queues are widely used in a variety of systems and applications. For example, queues are an essential building block of concurrent data structure libraries such as JSR-166, the Java® Concurrency Package described in Concurrent Hash Map in JSR166 Concurrency Utilities by D. Lea, which can be found online at gee.cs.oswego.edu/dl/concurrency-interest/index.html. It is therefore important to have robust concurrent queue implementations that perform well across a wide range of loads.
In recent years, good progress has been made towards practical lock-free FIFO queue implementations that avoid the numerous problems associated with traditional lock-based implementations. In particular, the lock-free FIFO queue implementation of Micheal and Scott described in Simple, fast, and practical non-blocking and blocking concurrent queue algorithms in Proceedings of the 15th Annual ACM Symposium on Principles of Distributed Computing, pages 219-228 (1996) (hereinafter referred to as MS-queue) outperforms previous concurrent FIFO queues across a wide variety of loads. A key reason is that the MS-queue algorithm allows enqueue and dequeue operations to complete without synchronizing with each other when the queue is nonempty. In contrast, many previous algorithms have the disadvantage that enqueue operations interfere with concurrent dequeue operations. Nonetheless, the MS-queue still requires concurrent enqueue operations to synchronize with each other, and similarly for concurrent dequeue operations. As a result, as the number of threads concurrently accessing the queue increases, the head and the tail of the queue become bottlenecks, and performance suffers. Therefore, while the MS-queue algorithm provides good performance on small-to-medium machines, it does not scale well to larger machines.
Although not previously available for FIFO queues, an elimination technique has been introduced for stacks by Shavit and Touitou as described in Elimination trees and the construction of pools and stacks in Theory of Computing Systems, 30:645-670 (1997). Their elimination technique is used to implement a scalable stack. A stack data structure supports a push operation, which adds a new element to the stack, and a pop operation, which removes and returns the most recently added element in the stack (if any). The elimination technique is based on the observation that if a push operation on a stack is immediately followed by a pop operation, then there is no net effect on the contents of a stack. Therefore, if a push and pop operation can somehow “pair up” the pop operation can return the element being added by the push operation, and both operations can return, without making any modification to the stack: we “pretend” that the two operations instantaneously pushed the value onto the stack and then popped it. The mechanism by which push and pop operations can pair up without synchronizing on centralized data allows exploitation of this observation.
Shavit and Touitou implement a stack that uses a tree-like structure that operations use to attempt to pair up and eliminate each other. The implementation is lock-free and scalable, but is not linearizable, which is discussed in Linearizability: A Correctness Condition for Concurrent Objects by M. Herlihy and J. Wing in ACM Transaction on Programming Languages and Systems, 12(3):462-492 (1990).
Shavit and Zemach introduced combining funnels in Combining funnels: a dynamic approach to software combining, Journal of Parallel Distributed Computing, 60(11):1355-1387 (2000), and used them to provide scalable stack implementations. Combining funnels employ both combining and elimination to achieve good scalability. They improve on elimination trees by being linearizable, but unfortunately they are blocking.
Both the elimination tree approach and the combining funnels approach are directed at scalability under high load, but their performance is substantially worse than other stack implementations under low loads. This is a significant disadvantage, as it is often difficult to predict load ahead of time. Indeed, load may be variable over the lifetime of a particular data structure, so we need data structures that are competitive under low load, and are scalable with increasing load.
Hendler, Shavit, and Yerushalmi introduced a scalable stack implementation in A scalable lock-free stack algorithm, Proceedings of the 16th Annual ACM Symposium on Parallelism in Algorithms and Architectures, pages 206-215, ACM Press (2004). Their scalable stack implementation performs well at low loads as well as being scalable under increasing load. Their implementation builds on a conventional (non-scalable) lock-free stack implementation. Such implementations typically use an optimistic style in which an operation succeeds if it does not encounter interference, and retries when it does. Since repeated retries on such stacks can lead to very poor performance, they are typically used with some form of backoff technique, in which operations wait for some time before retrying. However, as the load increases, the amount of backoff required to reduce contention increases. Thus, while backoff can improve performance, it does not achieve good scalability.
The stack implementation introduced by Hendler et al. includes a stack (the central stack) and a “collision array.” In the implementation introduced by Hendler et al., operations first attempt to access the conventional stack, but in the case of interference, rather than simply waiting for some time, they attempt to find another operation to pair up with for elimination. To achieve good scalability, this pairing up must be achieved without synchronizing on any centralized data. Therefore, a push operation chooses a location at random in the collision array and attempts to “meet” a pop operation at that location. Under higher load, the probability of meeting an operation with which to eliminate is higher, and this is key to achieving good scalability. However, the implementation introduced by Hendler at al. does not satisfy the properties of a FIFO queue. Accordingly, a lock-free scalable implementation of a FIFO queue is desired.