1. Field of the Invention
The present invention relates to coordination amongst processors or execution threads in a multiprocessor computer, and more particularly, to structures and techniques for facilitating non-blocking access to concurrent shared objects.
2. Description of the Related Art
It is becoming evident that non-blocking algorithms can deliver significant performance benefits to parallel systems in which concurrent shared objects are employed. See generally, Greenwald, Non-Blocking Synchronization and System Design, PhD thesis, Stanford University Technical Report STAN-CS-TR-99-1624, Palo Alto, Calif., (1999); Greenwald and Cheriton, The Synergy Between Non-Blocking Synchronization and Operating System Structure, in 2nd Symposium on Operating Systems Design and Implementation (Oct. 28–31 1996), pp. 123–136. Seattle, Wash.; Massalin and Pu, A Lock-free Multiprocessor OS Kernel, Tech. Rep. TRCUCS-005-9, Columbia University, New York, N.Y., 1991; Arora, Blumofe and Plaxton, Thread Scheduling for Multiprogrammed Multiprocessors, in Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (1998); and LaMarca, A Performance Evaluation of Lock-Free Synchronization Protocols in Proceedings of the 13th Annual ACM Symposium on Principles of Distributed Computing (Aug. 14–17 1994), pp. 130–140. Los Angeles, Calif. Because linked-lists are one of the most basic data structures used in modern program design, a simple and effective non-blocking linked-list implementation could serve as the basis for improving the performance of many data structures currently implemented using locks.
Valois presented a non-blocking implementation of linked-lists. See Valois, Lock-Free Linked Lists Using Compare-and-Swap, in Proceedings of the Forteenth ACM Symposium on Principles of Distributed Computing, Ottawa, Canada (August 1995). His implementation used compare-and-swap (CAS) operations and was highly distributed. However, it uses several specialized mechanisms for explicit GC, which can result in severe performance penalties. Furthermore, the implementation itself has never been proven to be linearizable even for simple set operations such as insert, delete, and find.
Indeed, Michael and Scott found various bugs in Valois' lock-free method, specifically the explicit memory management component. They corrected these bugs for a linked-list encoded queue implementation, but the resulting algorithm had the property that it could run out of memory easily under certain circumstances. See Michael and Scott, Correction of a Memory Management Methodfor Lock-Free Data Structures, TR599, Department of Computer Science, University of Rochester (December 1995). Michael and Scott later published another algorithm that overcomes the remaining memory leak problem. See Michael and Scott, Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms in 15th ACM Symposium on Principles of Distributed Computing (May 1996). However, these corrected implementations provide enqueue and dequeue operations on a linked-list encoded queue, rather than more flexible operations (such as insertion, deletion and find operations) in a set, multi-set, ordered set or other collection of elements encoded using a linked-list.
To overcome the complexity of building linearizable lock-free linked-lists using CAS, Greenwald advocated supporting a stronger DCAS operation in hardware, and presented a simple linearizable concurrent linked-list algorithm using DCAS. See Greenwald, Non-Blocking Synchronization and System Design, PhD thesis, Stanford University Technical Report STAN-CS-TR-99-1624, Palo Alto, Calif., (1999). His work was an extension of earlier DCAS based linked-list algorithms of Massalin and Pu, whose algorithms have the advantage of running in an environment without built-in garbage collection, but suffer the drawback of not being linearizable. See Massalin and Pu, A Lock-free Multiprocessor OS Kernel, Tech. Rep. TRCUCS-005-9, Columbia University, New York, N.Y., 1991.
Unfortunately, current multiprocessor architectures do not support powerful DCAS operations. Instead, support for simple CAS operations or LL/SC operation pairs is more typical. Accordingly, current techniques fail to provide a simple linearizable non-blocking implementation of linked-list suitable for the representation of sets, multi-sets, ordered sets or other collection of elements for which more flexible operations (such as insertion, deletion and find operations) are needed.