1. Field of the Invention
The present invention relates to the design of lookup structures within computer systems. More specifically, the present invention relates to a method and apparatus for implementing a fully dynamic lock-free hash table.
2. Related Art
Linear hash tables are commonly used to provide fast lookups for computer systems and computer applications. A linear hash table includes an array of buckets, which is occasionally resized so that on average each bucket holds an expected constant number of elements. This ensures that common hash table operations, such as insert, delete and search, require an expected constant time. For example, hash table 100 in FIG. 1 includes a bucket array 102, wherein each bucket includes a pointer to a linked list of data nodes. For example, bucket array 102 includes pointers 104, 110, and 114, which point to linked lists that include data 106-108, data 112-113, and data 116, respectively. In order to resize hash table 100 when the buckets become too full, each of the data nodes is typically “rehashed” into a larger bucket array.
The design of such hash tables becomes more complicated in a multi-threaded environment, because concurrently executing threads can potentially interfere with each other while performing operations on the same hash table. In order to prevent such interference, some hash table implementations use locks to prevent different threads from interfering with each other.
However, using locks can create performance problems. Locking an entire hash table can create a performance bottleneck because threads may have to wait for other threads to complete their hash table operations before obtaining access to the hash table. To mitigate this problem, some concurrent hash table implementations make use of multiple locks, which are associated with portions of the hash table. For example, if a hash table has N buckets, a different lock can be associated with each of the N buckets. This allows multiple threads to access different buckets in the hash table at the same time. However, in order to resize the hash table into a different number of buckets, the system has to first collect multiple locks, which can be an extremely time-consuming process. During the resizing process, all other operations dependent on these locks are prevented from making progress.
Because of the performance problems that arise from locks, a number of researchers have been developing lock-free data structures that operate efficiently in a multi-threaded environment.
Harris describes a way to build and modify a linked list that is lock-free and can be constructed using only load-linked (LL)/store-conditional (SC) or compare-and-swap (CAS) instructions (see Timothy L. Harris, “A Pragmatic Implementation of Non-Blocking Linked-Lists,” Proceedings of the 15th International Symposium on Distributed Computing, October 2001, pp. 300-14). The Harris list forms the basis of the two state-of-the-art lock-free hash tables described below.
The dynamic lock-free hash table by Michael is set up with a bucket array of a chosen size and an empty set of data nodes (see Maged M. Michael, “High Performance Dynamic Lock-Free Hash Tables and List-Based Sets,” The 14th Annual ACM Symposium on Parallel Algorithms and Architectures, pages 73-82, August 2002). In the hash table of Michael, data nodes are added to the linked lists associated with each bucket, and can be deleted when they are no longer wanted in the hash table. Unfortunately, if the hash buckets get to be “too full”, there is no way described to increase the number of buckets to reduce the average load. (Michael uses a slightly simpler variant of the Harris linked-list as the underlying structure to store the data nodes for each bucket.)
The split-list hash table by Shalev and Shavit is able to grow by doubling the size of the buckets table up to a pre-allocated limit (see Ori Shalev and Nir Shavit, “Split-Ordered Lists—Lock-free Extensible Hash Tables,” Proceedings of the Twenty-Second ACM Symposium on Principles of Distributed Computing, pages 102-111, Jul. 13-16, 2003, Boston, Mass.). This doubling involves adding a new “usable” segment that is as big again as the part already in use, and filling it with “uninitialized” values so that references to these new buckets will set themselves up properly, as described below. Their key improvement is that the data nodes of the table are maintained in a single long linked-list (such as Harris') and do not need to be moved when the number of hash buckets changes (see FIG. 2A). This innovation requires using a special hash function similar to a design by Litwin et al. that orders the buckets to permit recursive splitting of the list (see Withold A. Litwin, “Linear Hashing: A New Tool for File and Table Addressing”, Proceedings of the Sixth Conference on Very Large Data Bases, 1980, pages 212-223). The recursive splitting of the hash buckets means that every bucket (except the 0th one) has a “parent” bucket that is in some sense twice as coarse in dividing up the linked list.
The proper setup for Shalev-Shavit buckets is to have each bucket that has been referenced during a hash table operation point to a permanent dummy node in the otherwise-dynamic linked list that holds all the data nodes (see FIG. 2A). These dummies are assigned special hash keys, which are designed to fall in between the hash keys that are possible for real data nodes.
FIG. 2A illustrates a split-ordered list hash table with bit-reversed bucket pointers in which bucket array 202 includes pointers 204, 210, and 214. Pointers 204, 210, and 214 point to permanent dummy nodes 205, 211, and 215, respectively. Permanent dummy node 205 points to a linked list including data nodes 206-208, permanent dummy node 211 points to a linked list including data nodes 212-213, and permanent dummy node 215 points to a linked list starting with data node 216. Note that data node 208 points to permanent dummy node 211 and data node 211 points to permanent dummy node 215. In general, the last data node in a given region points to the next permanent dummy node within the linked list.
The dummy nodes serve as place holders so that entering the linked list by way of the hash bucket will always provide a pointer to a dummy node that is at the head of the region associated with that portion of the hash mapping. Once a bucket pointer has been initialized with a pointer to the corresponding dummy node, it does not change.
These dummy nodes are essential to the correctness of the Shalev-Shavit hash table and can never be deleted. Their algorithm also offers no way to reduce the size of the bucket array. Consequently, as a hash table grows it may add many dummy nodes, but as its contents are deleted the dummy nodes must remain, leaving, in some cases, a large structure of buckets and dummies with very little actual data remaining.
Additionally, the sharing of bucket array elements as the bucket table grows dictates that the space for the growing segments of the bucket array must be pre-allocated sequentially with the initial portions, so that the space taken by the bucket array at all times is effectively the space it will take when it has reached the maximum size supported. This is an expensive overhead when the “live” portion of the array space is a small portion of the allocation, and, more seriously, it puts a limit that must be set initially on the ultimate size the bucket array may ever reach.
A later addendum to the Shalev-Shavit hash table uses additional indirection to ease this problem. It allocates a table of pointers to bucket table segments, and allocates the actual segments upon demand. This scheme reduces the bucket table overhead, but the pointer table is fixed size and must be pre-allocated, and any segment ever used must be retained.
Hence, what is needed is a method and apparatus for implementing a fully dynamic lock-free hash table without the overhead involved in having to maintain a large number of buckets and dummy nodes. By fully dynamic we mean a data structure wherein the space consumed is proportional to the number of items actually in the hash table at any time.