1. Field of the Invention
The present invention relates to the design of lock-free data structures to facilitate multi-threaded processing within computer systems. More specifically, the present invention relates to a method and apparatus for implementing a practical, lock-free double-ended queue.
2. Related Art
Computational operations involving data structures become more complicated in a multi-threaded environment, because concurrently executing threads can potentially interfere with each other while accessing the same data structures. In order to prevent such interference, some systems control accesses to data structures by using locks to prevent different threads from interfering with each other. Unfortunately, locks often cause processes to stall, which can lead to significant performance problems, especially in systems that support large numbers of concurrently executing processes.
Because of the performance problems that arise from locks, a number of researchers have developed “lock-free” data structures, such as linked lists, that operate efficiently in a multi-threaded environment. Harris describes a way to build and modify a lock-free linked list that can be constructed using only load-linked (LL)/store-conditional (SC) or compare-and-swap (CAS) instructions (see Timothy L. Harris, “A Pragmatic Implementation of Non-Blocking Linked-Lists,” Proceedings of the 15th International Symposium on Distributed Computing, October 2001, pp. 300-14). Michael uses a variant of the Harris linked-list as the underlying structure for a lock-free hash table (see Maged M. Michael, “High Performance Dynamic Lock-Free Hash Tables and List-Based Sets,” The 14th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 73-82, August 2002).
Additionally, a number of researchers have developed lock-free double-ended queues (deques). A deque is an important data structure for achieving computational efficiency in a diverse range of applications. A deque allows data to be pushed or popped from either end, and a “lock-free” deque allows these operations to be performed concurrently by independent threads.
The simplest deques have a static size that is determined at the start. For some examples, see Ole Ageson et al., “DCAS-based Concurrent Deques”, Proceedings of the Twelfth Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 137-146, 2000.
As a deque grows, nodes can be dynamically allocated for the deque, and when the deque shrinks, nodes can be dynamically deallocated from the deque. The first published lock-free dynamic deque is in, “Even Better DCAS-Based Deques,” by Detlefs et al., Proceedings of the Fourteenth International Symposium on Distributed Computing, pp. 59-73, October 2000. However, these dynamic memory allocation and deallocation operations can be very time-consuming to perform in a multi-threaded system.
Hence, what is needed is a method and an apparatus for implementing a deque that is lock-free and is able to grow and shrink without having to perform as many time-consuming memory allocation and deallocation operations. Such a design has been published in a technical report from Sun Microsystems Labs TR-20020-111, “DCAS-based Concurrent Deques Supporting Bulk Allocation,” by Paul Martin et al 2002. This design (called “HatTrick”) allows the same memory to be used repeatedly to hold the items of the deque, rather than requiring an allocation and release for each item. The underlying data structure is linear, however, so the best re-use occurs when the number of pushes and pops on a specific end of the deque during a modest period of time are roughly the same. This condition is met when most items are eventually popped from the same end of the deque to which they were originally pushed, that is the deque is used primarily like two stacks.
If the usage is less regular, or if the most common behavior of an item is to push it on one end and pop it from the other (queue-like usage), then the reuse is reduced as the active portion of the deque relentlessly shifts away from the end that experiences the majority of pops. This requires some allocation of memory to add onto the end that is experiencing the majority of the pushes, and recovery of memory from the excess-pops end.
An underlying data structure with a ring topology allows re-use of nodes limited only by the relative stability of the size of the structure. It offers all the advantages of the linear bulk-allocation system, and can also re-use its storage indefinitely when the deque is being used in an unbalanced queue-like manner—the live data simply cycles around the ring of available storage. The current invention, which is described below, embodies these features.