Tree data storage structures such as B-trees and variations of B-trees (e.g., B*trees, B+trees), binary trees, and others are used for storing large files of information on secondary storage and for supporting insertion, lookup, deletion, and enumeration operations. Many tree data storage structures allow operations to be performed concurrently. Some tree data storage structures support concurrent operations by locking only portions of the structure, allowing operations involving other portions of the structure to continue concurrently. Such locking mechanisms are discussed, for example, in P. L. Lehman and S. B. Yao, Efficient Locking for Concurrent Operations on B-Trees, ACM Transactions on Database Systems, vol. 6, no. 4, pp. 650-670 (1981).
Primarily because locking mechanisms may be complicated, other tree data storage structures support concurrent operations by using an altered structure in the tree data storage structure. An example of such a structure, proposed by Lehman and Yao, is a B-link tree, which uses additional pointers to allow for more concurrent operations. Promoting concurrency and diminishing a focus on locking by changing the structure of the tree data storage structure is discussed in Y. Sagiv, Concurrent Operations on B *-Trees with Overtaking, Journal of Computer and System Sciences, vol. 33, no. 1, pp. 275-296 (1986).
The increase in concurrency offered by structures such as B-link trees may, however, pose problems during execution of concurrent operations caused by node deletion. The tree may be compacted as nodes are deleted, and the tree may be re-balanced to promote, for example, efficient execution of operations. A deletion algorithm such as that discussed in Sagiv may start a background thread to perform compaction and tree rebalancing. Because the background compression may delete empty nodes, a traversal operation may encounter a node that has been deleted and result in confusion, delay, and inefficiency. For example, unless there is an indication that a node has been deleted, storage allocated to deleted nodes may not be deallocated and reused until there is certainty that no reference to the deleted nodes will be used in some concurrent B-tree operations. Obtaining certainty that no reference to deleted nodes is active is difficult, especially in a distributed B-tree implementation, and the deallocated space therefore may not be reused promptly.
Thus, there is a need for efficient systems and methods for detecting deleted nodes in a tree data storage structure that provides for concurrent operations. The systems and methods should avoid complicated locking schemes, promote concurrency, and detect the deleted nodes to allow for restarting a traversal higher up in a tree structure. The systems and methods should include generating an exception when a pointer to a node that has been deleted is encountered.