1. Field of the Invention
The present invention relates to a data search system, and in particular to a data structure for search and a method of constructing the same.
2. Description of the Related Art
As a data search method, there have been known a table search that uses a table for search and a radix search that uses a tree structure for search. The table search such as a linear search or a binary search has a disadvantage that it takes increasingly much time for search with increase in the amount of data to be searched.
The radix search treats data as strings of symbols selected from a limited number of kinds of symbols and uses a tree structure in which a set of the strings of symbols is classified according to a sequential one of symbols starting at the top of each string of symbols and each node is linked to several others by branches. Each node linked to zero or more children stores pointers to zero or more children as many as the number of kinds included in the string of symbols. Provided with search data consisting of a plurality of symbols, the radix search is carried out by following nodes of the tree from a parent to its selected child according to a sequential one symbol of the given search data until a target is found.
Taking a longest prefix search method as an example, the radix search method will be described in detail. In the longest prefix search method, each item of data to be searched is represented by a fixed-length (L) or loss string of bits and has its related information. When provided with search data, all items of data matching the search data are found and one item having the longest match among them is selected as a target.
In the case where each item of data to be searched and its related information as shown in FIG. 1, a binary tree in which the data is stored is shown in FIG. 2. For example, when search data “0001111” is provided, the search is carried out by following nodes in the order presented: N1, N2, N3, N4, N6, and N7 while reading the related information from a node having a node information flag set (indicated by blacked circle). In this case, at the nodes N3, N6, and N7, the node information flag is set. In other words, the item of data stored in each of these nodes N3, N6, and N7 matches the search data. Among them, the related information “D” stored in the node N7 having the longest match (the lowest level) is obtained as a target. Assuming that the time required to follow one node is T, the maximum search time is 6T in the case of FIG. 2.
In order to reduce the depth of a tree to shorten the maximum search time, N-ary tree structure (N>2) is usually employed. In the case of N=4 (quad tree structure), the search data is sequentially read in units of two bits. Accordingly, an item of data consisting of an odd number of bits is expanded to data having a bit length of an integral multiple of 2 so as to meet the quad tree structure. For example, 5 bit data “00011” having the related information “D” is expanded to two items of 6-bit data: “000110” and “000111”, each having the same related information “D”, as shown in FIG. 3.
In the case where each item of expanded data to be searched and its related information as shown in FIG. 3, a quad tree in which the data is stored is shown in FIG. 4. Each node stores data having a format composed of a node information flag field FG, a pointer field having four pointers each corresponding branches: “00”, “01”, “10”, and “11” to its child nodes, and a related information field. The node information flag FG is set to 1 when storing data and to 0 when storing no data. In the node N2, for example, the node information flag FG is set to 1. The respective four pointers each corresponding to “00”, “01”, “10”, and “11” indicate N3, N4, N5 and NULL, which means that the node N2 has three corresponding branches to the child nodes N3, N4, and N5 and has no branch for “11”. The node N2 stores the related information A.
For example, when search data “0001111” is provided, the search is carried out by following nodes in the order presented: N1, N2, N4, and N7 while reading the related information from nodes N2, N4, and N7 each having the node information flag FG set to 1 (indicated by blacked circle). In this case, the item of data stored in each of these nodes N2, N4, and N7 matches the search data “0001111”. Among them, the related information “D” stored in the node N7 having the longest match (the lowest level) is obtained as a target. Assuming that the time required to follow one node is T. the maximum search time is 4T in the case of FIG. 4. Compared with the binary tree as shown in FIG. 2, the quad tree reduces its maximum search time to two-thirds the maximum search time of the binary tree.
However, if N for N-ary tree is set to a large number so as to shorten the maximum search time, then the number of nodes having the same related information is increased by expanding the data to meet the N-ary tree. In addition, since the size of one node and the number of pointers included therein are also increased in proportion to N, an increasing amount of memory is needed.
For example, in the case of each node storing 1-bit node information flag FG, four 6-bit pointers, and 8-bit related information, each node needs 21 bits (=1+6×2+8) for the binary tree and 33 bits (=1+6×4+8) for the quad tree. The total of bits needed in the binary tree as shown in FIG. 2 is 21(bits)×15(nodes)=315 bits and the total of bits needed in the quad tree as shown in FIG. 4 is 33(bits)×14(nodes)=462 bits. Assuming that the length of bits per address in a memory storing the data to be searched is 32 bits, 15 addresses are needed for the binary tree but 28 addresses for the quad tree.
As described above, according to a conventional N-ary tree data structure, the number of nodes having the same related information is increased by expanding the data to meet the N-ary tree. In addition, since the size of one node and the number of pointers included therein are also increased in proportion to N, an increasing amount of memory is needed.