1. Field of the Invention
The present invention relates to a computer system, and deals more particularly with a method, system, and computer program product for serializing data structure updates and retrievals without requiring searchers to use locks. The disclosed techniques may be used advantageously for, inter alia, optimizing performance of data structures used for network routing.
2. Description of the Related Art
A number of different types of data structures may be used in a computing system for storing information for which access time, including updates and retrievals of the stored information, is critical. These data structures include linked lists, hash tables, and tree structures. Tree structures in particular are often used as a technique for optimizing the number of operations that must be performed to locate a particular item within an ordered file system or database.
A number of different types of tree structures are known in the art, including binary trees, m-way trees, AVL trees (named for Adelson-Velskii and Landis, who introduced them), radix trees, B-trees, B*-trees, B'-trees, tries, and so forth. In binary trees, each node has at most two child nodes. AVL trees are also commonly referred to as height-balanced binary trees, which means that any subtree within the AVL tree is no more than one level deeper on its left (or right) side than it is on the right (or left) side. Radix trees are trees in which a search progresses based on a composite of the information found in the nodes. B-trees are height-balanced m-way trees, where an m-way tree is a search tree that has at most some number “m” entries in each node of the tree. B*-trees, B'-trees, and tries are all variations of B-trees. The particular nuances of these varying types of trees are not critical to an understanding of the present invention, and thus will not be described in further detail. (For a detailed discussion of these types of tree structures, reference may be made to “Fundamentals of Data Structures”, E. Horowitz and S. Sahni, published by Computer Science Press, Inc. (1976), pp. 422-549.)
It is desirable to balance a tree in order to assure an optimal and consistent worst-case cost in terms of the number of tree accesses that are required for locating a particular item (or, conversely, for determining that the particular item does not exist in the tree). As nodes are inserted into a balanced tree and deleted therefrom, it is necessary to re-balance the tree in order that the advantageous properties of the balanced tree are maintained. Algorithms for keeping trees in balance are known in the art. Typically, such algorithms tend to be complex and costly in terms of execution time. Furthermore, a re-balancing operation may result in decreased system performance because the tree cannot be used for productive accesses while the re-balancing is being performed.
When using tree structures on multi-programming operating systems that support concurrent execution by multiple threads, it is quite likely that one or more threads will try to access a particular tree for the purpose of retrieving already-stored data at the same time that one or more other threads tries to access the tree for updating (i.e. inserting, deleting, or changing) information. To ensure that the retrieval threads do not collide with the update threads and thereby return invalid or corrupted results to the requesting processes, serialization techniques are typically used to control the order in which the threads access the tree. When running in a multi-processor (MP) environment having a symmetric MP operating system (such as the OS/390® operating system from the International Business Machines Corporation (“IBM”)) wherein the computing task is shared among multiple central processing units, the serialization task becomes especially difficult. (“OS/390” is a registered trademark of IBM.)
One technique commonly used in the prior art for providing serialized access to tree structures is locking. Typically, threads or tasks that need only to retrieve information (referred to herein as “search tasks” or “searchers”) obtain a shared lock before using a tree, where a shared lock enables more than one search task (i.e. all those sharing the lock) to retrieve information at the same time. Tasks that need to update information, on the other hand, typically obtain an exclusive lock. While a task has an exclusive lock on a tree, no other update tasks nor any search tasks can access the tree. Instead, those tasks are typically suspended while waiting for the currently-active update process to complete and release the exclusive lock, at which time the suspended tasks will be resumed. Thus, while locking provides the necessary serialization, it does so at a very high cost in terms of performance overhead. For very busy systems such as super servers, the expense of this type of locking approach leads to very serious performance degradation.
Tree structures are commonly used in the routing tables of routers and gateways (referred to hereinafter as routers, for ease of reference), as a means for quickly evaluating the Internet Protocol (IP) address in a data packet in order to determine how to route the packet while providing an acceptable level of performance and throughput. As link speeds are increasing, the number of IP packets which a router is required to process per second is becoming very high. If an exclusive lock is held on a routing table implemented using a tree structure, then all data transfers and forwarding must stop until the lock is released. Operations on trees may require a significant amount of programming logic, and expenditure of a significant amount of computing time for rebalancing trees (as well as for traversing the trees to find a particular route). As will be obvious, it is very undesirable for the data transfers and forwarding to be halted even for relatively short periods of time, and thus it is desirable to optimize the tree operations.
Another technique commonly used in the prior art for providing serialized access to tree structures is to minimize the time spent in the locked status by not actually re-structuring or re-balancing the trees each time an update is performed. In this approach, deleted nodes are not completely removed until some predetermined number of deletes have been processed—or perhaps until a predetermined amount of time has elapsed. When this number of deletes occurs or this amount of time elapses, an exclusive lock is obtained, suspending all search tasks as the restructuring occurs. In some extreme cases, the entire tree may need to be rebuilt. In the interim, while it is not yet time to restructure the tree, the deleted nodes are simply marked as deleted or invalid. A serious disadvantage of this approach is that each task using the tree must check each node it accesses to determine whether that node is still valid, which significantly increases the access time of the task.
“Serialization of AVL-Binary Tree Element Retrieval via Duplexed Pointers”, IBM Technical Disclosure Bulletin, No. 10B, pp. 138-139 (March 1992) discloses a technique for serializing AVL trees without requiring locks for retrieval tasks. In this technique, the tree header contains an active tree indicator, a synchronization count, and a duplexed pair of pointers to the first tree element. Each node in the tree contains a key, a user data field or pointer thereto, a duplexed pair of left child pointers and a duplexed pair of right child pointers, and a duplexed pair of balance indicators. Retrieval operations use the active tree indicator to know which of the set of left and right child pointers to use (i.e. the “active” pointers); update operations use the opposite ones of these pointers (i.e. the “inactive” pointers). Each time an update is performed, the synchronization count in the tree header is incremented and the active tree indicator is switched. The values are stored in adjacent storage so that a single atomic action can be used for the increment and switch, ensuring that both are performed simultaneously. Prior to performing a retrieval, these values are saved. After the retrieval operation occurs, the saved values are compared to the values currently stored in the tree header. If they are identical, the retrieval ends normally. Otherwise, when they are different, this is a sign that the retrieval occurred from a now-obsolete version of the tree, and the retrieve operation must be re-done until the synchronization count and active tree indicator values have not changed. Updates are made to the inactive tree, without regard to whether searchers are still using the tree. This may have catastrophic results in some cases (e.g. when an update operation deletes a pointer that a search task is looking at.) This disclosure states that the storage for any node that was once part of the tree cannot be freed, as this will cause the retrieve operation to fail; instead, storage that is no longer needed for a node (e.g. because the node has been deleted or has been replaced by another node during an update process) is pooled and may be reused as part of the tree.
Commonly-assigned U.S. Pat. No. 5,089,952, which is entitled “Method for Allowing Weak Searchers to Access Pointer-Connected Data Structures Without Locking”, teaches a technique for avoiding use of locks while still ensuring that the content of the tree remains in a correct state. (A “weak searcher”, as defined therein, is an access task that has no intent of updating the stored information.) Update operations first lock the “scope” of a node, using prior art techniques which are not described for determining the scope (where the scope is defined as the path from a “deepest safe node”—i.e. a node that will not overflow or underflow during an update—to a leaf of the tree). The disclosed technique retains deleted nodes and nodes which have become redundant while performing insertions, where these deleted and redundant nodes are referred to as “disconnected nodes”, until one of several defined criteria has been met. In one solution, time stamps are used, where each searcher keeps track of how long it has been using the tree and compares this duration to a predetermined time period. If the search is not completed within this time period, it must be aborted and restarted. In a second solution, either a range and level value are added to each node in the tree and searchers must evaluate this range and level as they traverse each node, or a creation time is added to each node and searchers must check this creation time value. In either case, the search may need to be aborted and restarted, depending on the result of the comparison. In a third solution, a unique object identifier is added to each node, and this identifier must be checked during the search using one of the techniques from the second solution to determine whether the search must be restarted. The patent states that, in each of the three solutions, searchers will occasionally be required to restart their search unnecessarily. While the disclosed technique provides advantages over the prior art, the need to repeat searches as well as the need to perform additional operations to check the validity of nodes during the searching process add to the overhead of performing searches.
It would be preferable to use a serialization approach that minimizes use of locks (and thereby minimizes the resulting suspension of tasks) yet still guarantees that search results are valid and does not require searches to be re-started. Furthermore, it is desirable that the serialization approach has good performance characteristics and that it allows storage to be readily freed and re-used. The solution should preferably be extendable to other types of data structures, in addition to trees. The manner in which the present invention satisfies these objectives is described herein.