The present invention relates to an efficient concurrency control method for a high dimensional index structure that employs a time stamp sequence method during insertion, deletion, and searching objects to avoid possible conflicts caused by lock-coupling and to provide efficient ways to search objects being reinserted by introducing a reinsertion table.
Most index structures are formed in a tree shape, which is why the index structure is called the index tree. Each element of an index tree is called a node. The nodes constitute an index tree that is constructed by performing insertion and deletion operations. An entry object is inserted into the node of the tree in accordance with predetermined rules.
For each node of the index tree, a finite number of objects can be inserted. If the number of objects exceeds this predetermined number, this condition is called an overflow. When an overflow occurs, some entries in the node where overflow occurred are selected and then the selected entries are reinserted into the index tree one by one for performance improvement.
Previous studies of currency control methods for convenient index structures are mainly focused on a B-tree structure in which only one-dimensional data is used. Recently, concurrency control methods for an R-tree structure has been started. In an R-tree structure, an index tree for managing a multi-dimensional data is used. In these studies, the concurrency control method with the link technique shows better performance when compared with the concurrency control method with the lock-coupling technique.
However, it is difficult to directly apply a conventional link technique to a high dimensional index structure because of the structural difference and the difference in an insertion and division method.
The background knowledge that is helpful to understand the present invention is explained as follows.
First, a BLINK-tree is an index structure that achieves the serializability by adding a concurrency control technique to the conventional B-tree scheme. One of the conventional concurrency control techniques is the lock-coupling technique, in which a lock on the parent node is to be maintained until a lock on a child node is obtained for consistency of a tree structure.
In other words, this technique employs a top-down lock-coupling scheme. In the top-down lock-coupling scheme, a lock on the parent node can only be released after a lock on a child node is granted. The main drawback of this method is that locks are to be held during I/O operations, deletion operations, and splitting operations.
In order to avoid conflicts possibly caused by the locking-coupling technique, the conventional B-tree structure is slightly modified: a xe2x80x98rightlinkxe2x80x99 is added. Except for the rightmost node, the rightlink is a pointer going from every node to its right sibling on the same level of the tree. When a node is split in an index tree, a new right sibling is created. At this moment the rightlink of the old node is copied to the newly created node and then the rightlink of the old node is changed to point the new right sibling node. This means that all nodes at the same level are chained together through rightlinks.
Like the conventional B-tree, the BLINK-tree is sorted by a key value, that is, a maximum key value. A maximum key value is located in the rightmost position of each level of the index tree. In addition, it is possible to move to the rightward direction through the rightlink pointer. It is possible to freely search the index tree without using the lock-coupling technique in the BLINK-tree.
Let""s suppose that one search process checks the parent node. If a node-splitting operation performed on a child node before this search process reaches child node, the split operation is not yet known to the parent node. To avoid this error, the search process compares the key value of its own with maximum key value of child node.
If the key value of the search process is bigger than a maximum key value of the child node, the search process assumes that node splitting has occurred in the child node and moves to the next rightward node and compares the key value of the search process with the maximum key value of the node. Until the key value of the search process is smaller than or equal to the maximum key value of the node, this operation is performed repeatedly. If a node that has a desired key value is searched, then the search process continues searching to its child node.
The Insertion process can also use the same operation to locate a terminal node in order to add new records.
Since this operation doesn""t use lock-coupling technique for the insertion process and search process, lock conflicts can be efficiently avoided. In addition, the deadlock is not to be allowed.
The RLINK-tree is an index tree structure that is capable of providing an effective concurrency control like the BLINK-tree. By introducing the concurrency control technique of the BLINK-tree to the conventional R-tree, the RLINK-tree scheme was developed.
Unlike the B-tree, nodes of the R-tree are not sorted by the key value. It is problematic to apply the technique of the BLINK-tree directly to R-tree. In the BLINK-tree, key values are simply compared, and therefore it is possible to determine whether the lower level nodes have been split and when the rightlink search is to be finished. However, In R-tree, this operation is not feasible.
In order to apply the link technique used for the BLINK-tree, a LSN (Logical Sequence Number) is introduced. By manipulating LSN, it is possible to determine whether lower level nodes have been split and when the rightlink search is to be finished.
The LSN is a value that increases as time passes.
The structure of the conventional R-tree, the structure of the node, and objects in the node have been modified by the introduction of LSN. The search and insertion algorithms have been modified as well.
In the structure of the RLINK-tree in which LSN is employed, all nodes of the same level form a chain structure connected through rightlinks. The LSN that is the only value in the index structure is added to each node. An object of each node includes MBR (Minimum Bounding Region) of the node, a pointer pointing child node, and an expected LSN that the child node might have. In a high dimensional index structure, each object included in a node is represented as an N dimensional space vector, and the minimum region that is able to hold all objects of the node is called the MBR (Minimum Bounding Region). In addition, a pointer pointing to its rightward sibling node is added to the node. In the present invention, it is assumed that nodes in the CIR-tree node have all the elements described above.
When a node is split, the LSN of the split node is allocated to the newly generated node and a new LSN is allocated to the split node. When an index tree is being searched by a search process, it is possible to determine whether the child node has been split using LSN. That is, it is possible to determine whether the node is split by comparing the LSN of the child node with the expected LSN of the object of the upper node.
If the LSN of the child node is bigger than the expected LSN of the object of the upper node, the node has been split. Then the search is performed in the rightward direction through the rightlink until finding a node whose LSN is the same as the expected LSN of the object of the upper node.
Like this, the high concurrency control method of the BLINK-tree may be applied to the R-tree structure.
However, it is very difficult to avoid the lock-coupling in the RLINK-tree. When an operation that changes the structure of the tree is performed, the lock-coupling should be performed to maintain consistency of the tree due to the structural characteristic of the R-tree group.
The present invention provides an efficient concurrency control method for a high dimensional index structure that overcomes the problems encountered in conventional index structures.
Moreover, the present invention provides an efficient concurrency control method for a high dimensional index structure that avoids possible conflicts caused by lock-coupling by employing the time stamp sequence method during insertion, deletion, and searching objects. It is also possible to search objects being reinserted by employing a reinsertion table.
In order to achieve the objectives mentioned above, a search method is provided. The data structure to support this method is the queue, which is to remember which nodes still need to be examined. This method starts by initially inserting the root on the queue. A node that has not yet been examined is deleted from the queue and all entries in the node that qualify for the search condition are inserted to the queue and the whole process is repeated. In each level of the tree the reinsertion table is also examined and all entries in the table that qualify for the search condition are also inserted to the queue, which guarantees the entries that are being inserted can be searched. The search is terminated when the queue is empty.
In order to achieve the foregoing, an insertion method is provided. An insertion process is composed of three procedures. First, the leaf node should be located to insert a new entry to the node, and it is examined to determine whether the entry can be inserted to the node or not. Second, the new entry is inserted and MBR (Minimum Bounding Region) is adjusted if it is possible to insert the new entry to the node. Third, if it is not possible to insert the new entry to the node, which means the node is full, and the node should be split or some entries of the node should be reinserted.
In order to achieve the foregoing, a deletion method is provided. A deletion method is composed of two procedures: first, the leaf node to delete an entry must be located; and second, the MBR is adjusted.
A media that stores the programs for a search operation is also provided. The program runs the following procedures. The underlying data structure to support a search operation is a queue, which is used to remember which nodes still have to be examined. A search starts by initially inserting the root on the queue. A node that has not yet been examined is deleted from the queue and all entries in the node that qualify for the search condition are inserted to the queue and the whole process is repeated. In each level of the tree the reinsertion table is also examined and all entries in the table that qualify for the search condition are also inserted to the queue, which guarantees the entries that are being reinserted can be searched. The search is terminated when the queue is empty.
A media that stores the programs for an insertion method is also provided. The program runs the following procedures. First, the leaf node is located to insert a new entry to the node, and it is examined to determine whether the entry can be inserted to the node or not. Second, the new entry is inserted and the MBR (Minimum Bounding Region) is adjusted if it is possible to insert the new entry to the node. Third, if it is not possible to insert the new entry to the node, which means the node is full, the node is split or some entries of the node are reinserted.
In order to achieve the objectives mentioned above, a media that stores the programs for a deletion is provided. A deletion proceeds in two procedures: first the leaf node to delete an entry must be located; and second, the MBR is adjusted.