It is known that, in order to efficiently search information, a plurality of data strings to be searched are stored in a tree-type data structure using a patricia tree.
However, in a conventional tree-type data structure using a patricia tree, for example, in a case where a data string composed of a long character string is searched, there is a problem in that the search speed varies depending on the characteristic of the character string.
Specifically, as shown in FIG. 11, in a conventional data structure using a patricia tree, a tree is generated in which a branch is created for each common part, starting from the beginning of the character string. Therefore, in the case of a data set that includes character strings having many common characters in front portions thereof (that is, a data set that includes many character strings whose front portions resemble each other), the tree has an unbalanced shape, and the search speed becomes unstable and varies depending on the character string. For example, in FIG. 11, a character string ‘aaaaaaa’ is identified through seven branches, whereas a character string ‘bbbbbbb’ is identified through one branch. Therefore, when searching the character string ‘aaaaaaa’, the search speed decreases, and when searching the character string ‘bbbbbbb’, the search speed increases. In this manner, the search speed is unstable.
Therefore, a main object of the present embodiment is to provide a data structure generation method and the like that realizes a stable search speed independently of the characteristics of data strings.
In order to solve the above problem, the present embodiment has employed the following configurations.
One aspect of the generation method according to the present embodiment is a method for generating a tree-type data structure composed of a plurality of data strings. The method includes the steps of: summing, with respect to a plurality of data strings classified in a parent node, the numbers of data types of data, respectively, at at least one given string position in each of the plurality of data strings; and classifying, based on the numbers of the data types respectively summed at the at least one given string position in the summing step, the plurality of data strings into a plurality of child nodes, for the respective data types at a given string position.
According to the aspect, the plurality of data strings are classified into a plurality of groups for the respective data types at a given string position determined based on the numbers of the data types respectively summed at the at least one given string position. Therefore, the plurality of data strings are classified at the string position determined in accordance with the characteristics (data types) of the data strings. Accordingly, it is possible to generate a data structure that can realize a stable search speed in accordance with the characteristics of the data strings. It should be noted that in the summing step described above, the numbers of the data types of data may be respectively summed at every string position of each of the plurality of data strings.
In another aspect, in the classifying step, a string position where the numbers of the data types respectively summed in the summing step are equal to each other or close to an equal value is specified, and the plurality of data strings classified in the parent node are classified into the plurality of child nodes, for the respective data types at the specified string position.
According to the aspect, the plurality of data strings are classified equally or substantially equally based the numbers of the data types respectively summed at the at least one string position. Therefore, the plurality of data strings are classified in a good balance, independently of the characteristics of the data strings. Therefore, it is possible to generate a data structure that can realize a stable search speed, independently of the characteristics of the data strings.
In another aspect, the method further includes the step of controlling of recursively repeating a series of steps consisting of the summing step and the classifying step, using each child node created as a result of the classification in the classifying step as a parent node.
According to the aspect, using a created child node as a parent node, a plurality of data strings classified into the parent node are further classified based on the numbers of the data types respectively summed at the at least one string position. By this step being repeated, it is possible to generate a data structure that can realize a further stable search speed, in accordance with the characteristics of the data strings. In a case where a plurality of data string are equally or substantially equally classified, the generated data structure tree has a balanced shape as a whole. Accordingly, it is possible to generate a data structure that can realize a further stable search speed, independently of the characteristics of the data strings.
In another aspect, the controlling step recursively repeats the series of steps until each child node includes only one data string.
According to the aspect, a child node (leaf node) includes only on data string. Therefore, by causing a leaf node of the generated data structure to correspond to a data string, one to one, it is possible to generate a data structure that specifies a data string.
In another aspect, each data string is a bit string including a bit expressed by 0 or 1. In this case, in the summing step, with respect to the plurality of data strings classified in the parent node, the numbers of bits 0 and the number of bits 1, each bit 0 and each bit 1 corresponding to the respective data types, are respectively summed at the at least one given bit position in each of the plurality of data strings, and in the classifying step, based on the number of bits 0 and the number of bits 1, which are respectively summed in the summing step, the plurality of data strings are classified into two child nodes, in accordance with whether the bit at a given bit position is 0 or 1.
According to the aspect, since each data string is a bit string, the data strings can be classified into two types of groups, depending on whether the bit at a given bit position is 0 or 1. Accordingly, the generated data structure has a binary tree shape, and has a balanced shape. As a result, it is possible to generate a data structure that can realize a stable search speed, independently of the characteristics of the data strings.
In another aspect, the method for generating a tree-type data structure composed of a plurality of data strings is as follows. That is, the method includes a calculation step of calculating, with respect to a plurality of data strings classified in a parent node, data patterns in a given string range in each of the plurality of data strings, and a classifying step of classifying, based on the data patterns, the plurality of data strings into a plurality of child nodes such that each child node includes an equal number or a number close to the equal value of data strings.
According to the aspect, a plurality of data strings are equally or substantially equally classified, based on the data patterns in a given string range of the data strings. Therefore, the plurality of data strings are classified in a good balance, independently of the characteristics of the data strings. Accordingly, it is possible to generate a data structure that can realize a stable search speed, independently of the characteristics of the data strings.
In another aspect, the method for generating a tree-type data structure composed of a plurality of data strings is as follows. That is, the method includes a string position specifying step of specifying, with respect to a plurality of data strings classified in a parent node, a string position based on a predetermined algorithm, and a classifying step of classifying the plurality of data strings into a plurality of child nodes for respective data types at the string position specified in the string position specifying step.
According to the aspect, the plurality of data strings are not sequentially classified for the respective data types at each string position, starting from a string position at the beginning of each data string, but for the respective data types at the string position specified based on the predetermined algorithm. Accordingly, even when front portions of the plurality of data strings may resemble each other, it is possible to generate a data structure that can realize a stable search speed.
In the above, an exemplary method for generating a data structure has been described as a configuration of the present embodiment. However, the present embodiment may be configured as a data structure generated by the method, or as a library having the above data structure, or as a computer-readable storage medium having stored therein a game program that uses the above library when performing a predetermined game processing. Further, the present embodiment may be configured as an information processing apparatus that generates the data structure, an information processing system that generates the data structure, or a computer-readable storage medium having stored therein an information processing program that generates the data structure.
According to the present embodiment, it is possible to provide a method and the like for generating a data structure that realizes a stable search speed, independently of the characteristics of data strings.
These and other features, aspects and advantages of certain exemplary embodiments will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings.