A critical problem in many applications is the ability to locate a specific data item in a stored set of data items in an efficient manner. One well-known and widely used solution is the binary tree. The performance of binary tree routines is, on average, very good, but suffers when data manipulation is not localized. The insertion or removal of data in order, for example, can produce an unbalanced tree structure, thus leading to extremely poor search performance. Consequently, after an insertion or deletion, a tree must be rebalanced if an imbalance has been introduced. Specific balanced tree techniques, such as AVL trees, have been designed to rearrange the tree to correct imbalances which occur as operations are performed, but may incur a compute overhead unacceptable to some applications. In cache controller applications, for example, a cache manager is expected to dynamically delete as well as add cache index entries to the caching data structures. The extra overhead required for such operations may be unacceptable in the cache controllers of embedded control systems, like storage controllers, having real-time performance constraints.
Skip lists, a variant of linked linear lists, provide yet another approach. Specific information on skip lists and related algorithms is presented and discussed at some length in a paper by William Pugh, entitled "Skip Lists: A Probabilistic Alternative to Balanced Trees", Communications of the ACM. June 1990, pp. 668-676. In this paper, Professor Pugh suggests skip lists as a performance-enhanced alternative to balanced trees for situations requiring the frequent insertion or removal of keyed items in an ordered search structure.
Essentially, a skip list is an ordered, linked linear list of list elements, or nodes, in which some nodes contain extra pointers that skip intermediate nodes, thus increasing the speed of searches. A node with i extra forward pointers is known as a level i node. The pointers of a level i node are indexed from 0 to i; the pointer at index i in the node points to the next node of level i or greater in the list, according to the ordering scheme of the list. Thus, the pointer at index 0 of a node always points to the next node, but the pointer at index 2 of a node will point to the next node of level 2 or higher. The level of a node, from 0 to some maximum M which is a property of the skip list, is assigned when it is inserted into the skip list. A random number generator is used, causing the probability of a node being at level L or higher to be 1/(b L), where b is a property of the skip list. Hereafter, a base of b=4 is assumed; however, the choice of b (as well as M) is up to the implementer.
Like other linked list data structures, skip lists generally begin with a "list head" which points to the first node in the list. In the case of skip lists, however, the list head has M pointers, indexed in the same way as for any node in the skip list.
Skip lists have the following properties: search time proportional to log(#entries); insertion time proportional to log(#entries); delete time proportional to log(#entries); and extremely simple code--approximately 1/4 the size of an AVL tree implementation. Each entry of a search structure needs room for a maximum of M pointers; the best performance will be obtained when M&gt;= log[base b] (maximum number of entries in the list), although a smaller value of M will work with reduced performance to realize the above performance, although fewer pointers will work with a reduced performance. If variable-size entries are allocated, the average number of pointers actually used per entry is less than two, which is less memory overhead than used by AVL trees.
Certain algorithms, specifically insertions and deletions, can present difficulties when performed on linked lists. Generally, deleting a node requires a traversal of the list until the correct node is found. Once located, the pointer from the previous node is made to point to the next node (i.e., the node following the "removed" node). Therefore, there must be some way to keep track of which node precedes the node to be deleted. Similarly, adding a node to a linked list requires that the address of the node examined prior to reaching the insertion point be saved, since the only way to determine the correct insertion point is to pass it. Thus, special accommodations must be made to maintain such information. One technique utilized for a singly linked list, for example, is to maintain a separate pointer to point to the forward pointer of the predecessor node. Another solution involves the use of a doubly linked list, in which each node is linked to both of its neighboring nodes. Further information on linked list data structures and methods for performing basic operations thereon may be had with reference to such text books as "Data Structures: An Advanced Approach Using C" by Jeffrey Esakov and Tom Weiss (Prentice Hall, 1989), and "Algorithms" by Robert Sedgewick (Addison-Wesley, 1988).
The problem discussed above becomes significantly more complex when skip lists are used, as there may be more than one node pointing to the node being inserted or deleted. For insert and delete operations in skip lists, then, there is a search cost proportional to the level of the node being inserted or deleted. Typically, a skip list search routine must maintain an array of predecessor pointers, one for each level, in case an insertion is required. If the ratio of insertions to searches is significant (more than a few percent), the extra time maintaining this pointer array on all searches will be comparable to or less than the time needed to specifically contruct that array for insertions.
There are, however, special cases where insert operations do not follow searches. In a cache control application, for example, a cache manager will perform an insertion which does not follow a search when it needs to split a cache node for transfer alignment purposes. More particularly, node splitting is encountered in RAID controllers, where alignment is needed for reconstruct-write RAID operations. Hence, there is a need for some type of mechanism which will ensure that the insertions in such special cases are constant as well.
There are also applications of skip lists where a node removal is performed without a preceding search. For instance, node removal in a cache control application performed on the basis of a Least Recently Used algorithm does not follow a search. Remove time accounts for approximately 25% of the list maintenance overhead assuming a 50% cache hit ratio. If remove operations frequently followed searches, remove time would be a small constant. However, frequency of removes is almost identical to the frequency of cache misses. Hence, a remove algorithm could greatly benefit from the utilization of a data structure enhancement to reduce the remove time to a small constant plus another small value proportional to log(#entries).