The present invention in general relates to a technology for sorting data, such as numerical values or text. More particularly, this invention relates to a technology with which the data can be sorted speedily.
Various modes (i.e. systems, or methods) that are used when sorting data are. For example, the well-known quick sorting mode, a radic sorting mode, and a multiple division sorting mode are known conventionally. These sorting modes are not, however, universal since they have both merits and demerits depending on the quantity and properties of data to be sorted. It is, therefore, necessary to involve a person familiar with sort target data and sorting modes and having advanced knowledge in the selection of a sorting mode. From this, means and methods with which even a person who does not have advanced knowledge about sorting can appropriately execute sorting have been conventionally demanded.
As one of the sorting modes, the above-stated quick sorting mode, radic sorting mode, multiple division sorting mode and the like are conventionally employed in various fields. The quick sorting mode is to execute sorting at high speed by dividing sort target data into small groups. Description will be given to procedures for sorting sort target databy quick sorting with reference to the following terms (a-1) to (e-1).
First, if sort target data in a sequence shown in the term (a-1) are given, the forefront data or xe2x80x9c5xe2x80x9d (indicated by symbol *) is selected as a reference value. Next, as shown in the term (b-1), lower values (xe2x80x9c3xe2x80x9d, xe2x80x9c2xe2x80x9d, xe2x80x9c4xe2x80x9d and xe2x80x9c1xe2x80x9d) than the reference value xe2x80x9c5xe2x80x9d (indicated by symbol *) are put before the reference value xe2x80x9c5xe2x80x9d and higher values (xe2x80x9c6xe2x80x9d, xe2x80x9c8xe2x80x9d and xe2x80x9c7xe2x80x9d) than the reference value xe2x80x9c5xe2x80x9d are put after the reference value xe2x80x9c5xe2x80x9d. In this way, the term (b-1) is divided into unsorted two groups as indicated by brackets, i.e., a group (xe2x80x9c3xe2x80x9d, xe2x80x9c2xe2x80x9d, xe2x80x9c4xe2x80x9d and xe2x80x9c1xe2x80x9d) put before the reference value xe2x80x9c5xe2x80x9d and a group (xe2x80x9c6xe2x80x9d, xe2x80x9c8xe2x80x9d and xe2x80x9c7xe2x80x9d) put after the reference value xe2x80x9c5xe2x80x9d.
Next, in the term (c-1), xe2x80x9c3xe2x80x9d (indicated by symbol *) and xe2x80x9c6xe2x80x9d (indicated by symbol *) at the forefronts of two bracketed groups are selected as reference values, respectively. Next, as shown in the term (d-1), lower values (xe2x80x9c2xe2x80x9d and xe2x80x9c1xe2x80x9d) than a reference value xe2x80x9c3xe2x80x9d are put before the reference value xe2x80x9c3xe2x80x9d and a higher value (xe2x80x9c4xe2x80x9d) than the reference value xe2x80x9c3xe2x80x9d is put after the reference value xe2x80x9c3xe2x80x9d in one group. Higher values (xe2x80x9c8xe2x80x9d and xe2x80x9c7xe2x80x9d) than a reference value xe2x80x9c6xe2x80x9d are put before the reference value xe2x80x9c6xe2x80x9d in the other group. In the other group, no value lower than the reference value xe2x80x9c6xe2x80x9d exists.
As a result, the term (d-1) is divided into two unsorted groups, i.e. , the bracketed group (xe2x80x9c2xe2x80x9d and xe2x80x9c1xe2x80x9d) put before the reference value xe2x80x9c3xe2x80x9d and the bracketed group (xe2x80x9c8xe2x80x9d and xe2x80x9c7xe2x80x9d) put after the reference value xe2x80x9c6xe2x80x9d. Next, in the term (e-1), xe2x80x9c2xe2x80x9d (indicated by symbol *) and xe2x80x9c8xe2x80x9d (indicated by symbol *) at the forefronts of the two bracketed groups are set as reference values, respectively.
Next, as shown in the term (e-1), a lower value (xe2x80x9c1xe2x80x9d) than the reference value xe2x80x9c2xe2x80x9d is put before the reference value xe2x80x9c2xe2x80x9d in one group. In this case, no value higher than the reference value xe2x80x9c2xe2x80x9d exists. A lower value (xe2x80x9c7xe2x80x9d) than the reference value xe2x80x9c8xe2x80x9d is put before the reference value xe2x80x9c8xe2x80x9d in the other group. In the other group, no higher value than the reference value xe2x80x9c8xe2x80x9d exists. Through the above-stated quick sorting procedures, the sort target data shown in the term (a-1) are sorted as shown in the term (e-1) in ascending order.
In case of the quick sorting stated above, as sorting (the division of the sort target data) proceeds, access to a memory storing the sort target data becomes local. Due to this, the quick sorting is characterized by high sorting speed. Here, the sorting speed Oq of the quick sorting is expressed by the following formula (1) while assuming that the number of sort target data is n:
Oq=nxc2x7log2nxe2x80x83xe2x80x83(1)
Further, the radic sorting stated above is a mode for dividing sort target data into figures and executing sorting for each figure. Procedures for sorting sort target data by means of radic sorting will be described next with reference to the following terms (a-2) to (c-2):
In case of executing the radic sorting, two buffers (which will be referred to as xe2x80x9cbuffer Axe2x80x9d and xe2x80x9cbuffer Bxe2x80x9d) are used. Namely, sort target data (two-figure numerical value) in the term (a-2) are sorted in ascending order with first figures as keys, as shown in the term (b-2). The sorting result is stored in the buffer A. Next, as shown in the term (c-2), the sorting result stored in the buffer A is sorted in ascending order with second figures as keys. This sorting result is stored in the buffer B. In this way, the radic sorting is required to access the buffers by as much as the number of figures of the sort target data.
The sorting speed Or of the above-stated radic sorting is expressed by the following formula (2) while assuming that the number of the figures of the sort target data is m and the number of the sort target data is n:
Or=mxc2x7nxe2x80x83xe2x80x83(2)
Further, the multiple division sorting is a mode devised to accelerate the sorting speed of the quick sorting stated above. In this multiple division sorting mode, sort target data are divided into segments in advance and quick sorting is executed in units of divided segments. Here, the sorting speed Om of the multiple division sorting is expressed by the following formula (3) while assuming that the number of segments is L and the number of the sort target data is n, and the sorting speed Om is expected to be higher than the sorting speed Oq of the quick sorting expressed by the formula (1) by n/L:
Om=Lxc2x7(n/L)xc2x7log2(n/L)xe2x80x83xe2x80x83(3)
Meanwhile, as described above, as sorting proceeds, access to the memory becomes local in the quick sorting. Due to this, the quick sorting has normally, advantageously high sorting speed. However, if the quantity of the sort target data is small, the sorting speed of the quick sorting becomes disadvantageously lower than the sorting speed of the radic sorting as can be seen from the formulas (1) and (2).
The radic sorting has, by contrast, advantageously high sorting speed if the number of the sort target data is small. Actually, however, in case of the radio sorting, if the capacities of buffers are insufficient relatively to the quantity of the sort target data, it is necessary to secure a sorting region in a main storage region which is accessed slow. In this case, the access time of the main storage region has an adverse effect on the sorting speed of the radic sorting, and the sorting speed becomes eventually, disadvantageously lower.
Further, in theory, the multiple division sorting can advantageously execute sorting at higher speed than that of the quick sorting if the sort target data is optimally divided into segments. Actually, however, if the sort target data is not divided into segments in consideration of data distributed conditions, segment division often fails. As a result, the segment division time is consumed for nothing, and the sorting speed becomes disadvantageously lower than that of the quick sorting.
As can be seen from the above, the quick sorting algorithm, the radic sorting algorithm and the multiple division sorting algorithm have both merits and demerits depending on the quantity and properties of sort target data and these modes are not, therefore, not universal. For that reason, if a sorting mode having high sorting speed is selected, it is necessary to involve a person familiar with sort target data and sorting modes.
It is an object of the present invention to provide a sorting device, a sorting method and a computer readable recording medium recording a sorting program capable of automatically selecting a sorting mode corresponding to the properties and quantity of sort target data and accelerating sorting speed.
The sorting device according to one aspect of this invention comprises a distribution analyzing unit which analyzes a distribution of a sort target data group consisting of a plurality of sort target data; a setting unit which sets sorting segments for dividing the sort target data group into segments based on an analysis result of the distribution analyzing unit; an appearance frequency calculation unit which calculates an appearance frequency of a sort target data corresponding to each sorting segment based on a sorting key; a sorting mode selection unit which selects a first sorting mode if the appearance frequency is equal to or higher than a preset threshold value and selecting a second sorting mode if the appearance frequency is lower than the threshold value, for each sorting segment; and a sorting execution unit which executes sorting for each of the sorting segments based on the sorting mode selected by the sorting mode selection unit.
The sorting method according to another aspect of this invention comprises the following steps. That is, a distribution analyzing step of analyzing a distribution of a sort target data group consisting of sort target data; a setting step of setting sorting segments for dividing the sort target data group into segments based on an analysis result of the distribution analyzing step; an appearance frequency calculation step of obtaining an appearance frequency of a sort target data corresponding to each sorting segment based on a sorting key; a sorting mode selection step of selecting a first sorting mode if the appearance frequency is equal to or higher than a preset threshold value and selecting a second sorting mode if the appearance frequency is lower than the threshold value, for each sorting segment; and a sorting execution step of executing sorting for each of the sorting segments based on the sorting mode selected by the sorting mode selection means.
According to the above-mentioned aspects of this invention, sorting segments are set based on the result of analyzing the distribution of the sort target data group, the appearance frequency of the sort target data is obtained for each sorting segment, and either the first sorting mode or the second sorting mode is selected for each sorting segment depending on the result of the comparison of the appearance frequency to the threshold value. Thus, this invention can automatically select a sorting mode corresponding to the distribution (properties and quantity) of the sort target data group and accelerate sorting speed compared with conventional sorting speed.
Other objects and features of this invention will become apparent from the following description with reference to the accompanying drawings.