(1) Field of the Invention
This invention relates to a computer program, method, and apparatus for data sorting, and more particularly, to a computer program, method, and apparatus for sorting large sets of data.
(2) Description of the Related Art
A known data sorting method is that given data is rearranged in a predetermined order.
Conventional data sorting methods include quicksort, bubble sort, and shell sort. These data sorting methods require a computing time longer than linear time for processing data. Therefore, processing of a great number of data increases a computing time and severely deteriorates the computing performance of a computer, which is a problem.
To solve this problem, there is known a data sorting method using a tree structure (graph) that has a single root and does not have a closed loop (that is, an open-loop tree structure) (for example, refer to Japanese Unexamined Patent Publication No. 2003-44267).
One type of such tree structures is a TRIE structure that enables given data to be sorted within linear time (a time proportional to an amount of data).
FIG. 12 shows character strings (data) each having a plurality of characters in a TRIE structure.
In this connection, in FIG. 12, the vertical length and the horizontal length are called “depth” and “width”, respectively.
A TRIE 90 has one or more nodes, and especially, the first node is called a “root”. A line connecting a node and a node is called a “branch”. In addition, in the case where a certain node is called a “parent node”, a node at one lower level than the parent node is called a “child node”. A node that has no child node is called a “leaf”.
In order to look up characters, the TRIE 90 is walked down from the root. The root and nodes have as many branches as the kinds of characters represented in the TRIE 90, and the TRIE 90 is walked down by sequentially selecting branches corresponding to characters. In the TRIE 90, the root has three branches B, C, and D. For example, in the case of a character string “BACK”, a branch B is first selected, and branches A and then C are selected, and finally, a branch K is selected, thus completing the look-up.
A data sorting method using such a TRIE structure, which is disclosed in Japanese Unexamined Patent Publication No. 2003-44267, does not severely deteriorate computing performance in processing a large amount of data (character strings).
However, the TRIE structure has a drawback that a computational domain (main memory capacity) proportional to an amount of different data (sort items) should be prepared (a large size of memory is consumed). Therefore, processing of a large number of character strings or long character strings causes a memory overflow, resulting in delaying a processing time.
That is to say, sorting or compiling of large sets of data having few overlapping data with such a TRIE structure causes a lack of computational domain, and severely deteriorates computing performance or ends in failure. This is a problem.
To solve this problem, there is known a method of grouping given data into a plurality of data groups and sorting the data of each group with an existing sorting method (for example, refer to Japanese Patent No. 2959497).
This method, however, has a problem that, since data is simply grouped, an order of data groups is not exactly determined, and therefore, another data process for combining the processed data groups should be performed. That is, the method does not realize efficient processing (that is, high processing cost).