1. Field of the Invention
This invention deals with the method to set up multidimensional index in the database system of memory database, especially the implementing method of “Grid+T Tree” technology in memory database.
2. Description of the Related Art
Generally millions of records of data need to be processed in the telecommunications business supporting system, while the limited host computers' resources, like CPU and memory, are very valuable. Thus for the research on memory database technology, it comes to be an important and difficult point to complete relevant database operations with the least resources and highest efficiency. Adopting a good index structure is one effective approach to assure the efficiency of memory database operations.
During the past many years research on index structure, tree structure is always one of the effective methods to set up multidimensional index for database system. Among all these multidimensional index structures, “K-D Tree”, “R Tree” and “T Tree” are the most popular ones.
1.1 “K-D Tree”
“K-D Tree” is a binary search tree for a k-dimensional space. It mainly stores node data. Within every internal node, it divides the k-dimensional space, which is represented by the node, into two sectors, by one (k−1)-dimensional hypersurface. These hypersurfaces emerge alternately on k possible directions, and each hypersurface should include at least one node data. FIG. 1 is a sample of “K-D Tree”.
From the perspective of operations, search or insertion on a “K-D Tree” is very simple, while the deletion is somewhat complicated, since deleting one node may cause the rebuilding of its subtrees. Since “K-D Tree” can only process node data, it can only use their central point instead for other formed data. The points need to be emphasised are when the data is inserted in different order, the structure of the “K-D Tree” will also be different, and the data may emerge at any places of the tree dispersedly, other than only emerge at the leaf nodes.
From the introduction above, it can be seen that “K-D Tree” is a multidimensional binary tree structure. So it has very good index efficiency for classical database system based on disks. But because each node only has one data node and two pointers of its left and right children, the storage efficiency seems to be too low to memory database, whose memory space is extremely valuable.
1.2 “R Tree”
“R Tree” is a multidimensional index structure similar to “B+ Tree”. Each of its internal nodes stores not the data, but the Minimum Bounding Rectangle (MBR) of all sub-nodes. The actual data is stored within the leaf nodes, and all the leaf nodes are shown at the same level, which can be seen from FIG. 2.
The search operation traverses the tree to find all leaf nodes of which the MBRs overlap the query rectangle. On insertion of a new entry, the R-tree finds the leaf node that needs the least area enlargement of its MBR in order to contain the MBR of the new node. The deletion begins from one exact search. If the node is found, it will be deleted, and the MBR of its ancestor nodes will be modified successively.
From the introduction above, it can be seen that since the structure of “R Tree” is similar to “B+ Tree”, thus it satisfied the requirement of less disk access and faster search speed. While because all the data is stored at the leaf nodes, and the internal nodes only store the relevant information of sub-nodes, so “R Tree” wastes a lot of memory space.
1.3 “AVL Tree”
An “AVL Tree” is the first-invented self-balancing binary search tree. In an “AVL Tree” the heights of the two child subtrees of any node differ by at most one, therefore it is also known as height-balanced. Search, insertion, and deletion are all O (log n) in both the average and worst cases. Additions and deletions may require the tree to be rebalanced by one or more tree rotations.
1.4 T Tree
Similar to “AVL Tree”, the minus between left and right sub-trees' height will not exceed 1. Different from “AVL Tree”, “T Tree” can store several key values within one storage node. Its leftmost and rightmost keys are respectively the minimum and maximum key values of this node. Its left sub-tree only contains those records with smaller key value, and the right sub-tree only contains those records with larger key value. FIG. 3 is a structure chart of “T Tree”.
From the structure of “T Tree”, it can be seen that “T Tree” has the same time complexity O (Log2N) as what “K-D Tree” and “R Tree” have. The largest difference is that each node of “T Tree” includes multiple keys, and only the pointers of left and right sub-node contain extra additional information, which improves the efficiency of the nodes.
The balance process of “T Tree” is similar to the one of “AVL Tree”. They are all implemented by four operations: single left rotation (LL); double left rotation (LR); single right rotation (RL); double right rotation (RR). The only difference is that the LR or RL operation of “T Tree” may change a leaf node into an internal node with only one elementary. Thus, one elementary of its child node needs to be moved to this node, to assure it remains one “T Tree”. The balance factor of nodes of “AVL Tree” is the result of its right child tree's height minus its left child tree's height. When the balance factor is 1, 0, or −1, it is regarded as balanced. When the balance factor is 2 or −2, the tree is regarded as not balanced, and needs to be rebalanced. The balance factor can be stored within every node directly, or can be calculated by the heights of child trees stored in the node.
The recursion arithmetic of inserting a new data element, “e”, into a balanced binary sort tree (BBST) can be described as follows:    1. If the BBST is an empty tree, then a new node with its data element “e” is inserted as the root node of BBST, and the height of the tree is increased by 1;    2. If the keyword of “e” is equal to the keyword of the root node of BBST, the operation will not go along.    3. If the keyword of “e” is less than the keyword of the root node of BBST, and the keyword of every node of BBST's left child tree is not equal to the keyword of “e”, then “e” can be inserted to BBST's left child tree. After insertion, the height of left child tree should be increased by 1, and there are different follow-up operations according to different situations:            a. If the balance factor of BBST's root node is −1 (the height of right child tree is larger than the height of left child tree), then modify the balance factor of the root node into 0, and the height of BBST remains changed;        b. If the balance factor of BBST's root node is 0 (the height of right child tree is equal to the height of left child tree), then modify the balance factor of the root node to 1, and the height of BBST should be increased by 1;        c. If the balance factor of BBST's root node is 1 (the height of left child tree is larger than the height of right child tree) and if the balance factor of the root node of BBST's left child tree is 1, then it needs the single right rotation. After the rotation, the balance factors of the root node and the root node of right child tree should be changed to 0, and the height of BBST remains changed.            4. If the keyword of “e” is larger than the keyword of BBST's root node, and the keyword of every node of BBST's right child tree is not equal to the keyword of “e”, then “e” can be inserted to BBST's right child tree. After insertion, the height of right child tree should be increased by 1, and there are different follow-up operations according to different situations
“T Tree” can be regarded as a high efficient memory data structure of MMDB. “T Tree” is an “AVL Tree” based on Adel'son Vel'skii and Landis.11. Same as “AVL Tree”, the difference of heights between left child tree and right child tree of “T Tree” may be 1 in most situations.
“T Tree” is much more efficient than “K-D Tree” or “R Tree” in using memory space, but it still has its shortcomings. In the situation of one-dimensional index, “T Tree” has very good search effect on both exact search and range search, but in the situation of multidimensional index, its shortcomings is obvious that it can only use one field of the keyword as the index. For example, the keyword of some table consists of three fields, <key1, key2, key3>. Assuming that setting up a “T Tree” by using the value of key1 as the index, if at some time the record set in the memory is constructed by all possible combinations of C=f {<i,j,k>, 0<i<10000; 0<j<1000; 0<k<100}, which contains 1 billion records, then it needs to compare at least 1 million records during search. Such search efficiency is too low.
From the analysis above, among all these multidimensional index structures, “K-D Tree” and “R Tree” have the shortcomings of wasting memory for memory database, whose memory space is very valuable. Although “T Tree” has its advantages of fast memory access and saving memory space, its efficiency is still not high for multidimensional index.