1. Field of the Invention
This invention is related to a bit string data sorting apparatus, sorting method and program.
2. Description of Related Art
In recent years, with advancements in information-based societies, large-scale databases have come to be used in various places. To search such large-scale databases, it is usual to search for a desired record, retrieving the desired record by using as indexes items within records associated with addresses at which each record is stored. Character strings in full-text searches can also be treated as index keys.
Because the index keys can be expressed as bit strings, the searching of a database can be reduced to searching for bit string data in the database.
Also, one of the processings applied to a database is the sort processing using index keys for the records in the database. This sort processing can also be reduced to the sort processing of keys composed of bit string data. Hereinbelow, the bit string data may also be referred to simply as a key.
Various sort methods have been developed; patent reference 1 below introduces quick sort, radix sort, and so forth. Also patent reference 2 teaches a radix sort.
FIG. 1 is a drawing describing the concepts of previous radix sorts. In accordance with a radix sort, a key, for example, the 4-bit bit string that is the object of the sort shown in the example in FIG. 1, is sorted by repeating the classification of the bit value in each bit position from the 0th bit to the 3rd bit. Hereinbelow, the concepts of a radix sort are described using the example shown in FIG. 1.
FIG. 1 shows key string 100 composing the keys that are the object of the sort. In the example shown in FIG. 1, key “0001” is in the storage area wherein key position 101, which is the position of keys included in key string 100, is 100a (hereinafter this may be called key position 100a). Also, “1111”, “0011”, “1010”, and “1110” are in key positions 100b, 100c, 100d, and 100e respectively.
As shown in FIG. 1, first, the bit strings are classified at the 0th bit of the keys included in key string 100 using classification for each bit position (0th bit) 110a. As a result, group 120a for keys with the value 0 at the 0th bit, consisting of key “0001” and key “0011”, and group 121a for keys with the value 1 at the 0th bit, consisting of key “1111”, key “1010”, and key “1110”, are obtained. Next the bit strings are classified at the 1st bit using classification for each bit position (1st bit) 110b for the value 0 group 120a and using classification for each bit position (1st bit) 110e for the value 1 group 121a respectively.
On one hand, in the classification for each bit position (1st bit) 110b, because the 1st bit of key “0001” and key “0011” are both 0, only the value-0-at-bit-1 group 120b is obtained from the classification, and the same keys as in value-0-at-bit-0 group 120a are subject to a classification at the 2nd bit, and the bit strings are classified at the 2nd bit using classification for each bit position (2nd bit) 110c. Because classification for each bit position (2nd bit) 110c obtains the single key “0001” as the value-0-at-bit-2 group 120d, it is stored in the storage area wherein key position 131, which is the position of keys stored in sorted key string 130, is 130a (hereinafter this may be called key position 130a) in which the key with the smallest value is stored. In the same way, because a single key “0011” is obtained as the value-1-at-bit-2 group 121d, it is stored in key position 130b as the key with the next to smallest value in sorted key string 130. Also, the keys in the sorted key string are stored in the sequence of the key position labels from the smallest key.
On the other hand, because a single key “1010” is obtained as the value-0-at-bit-1 group 120e in the classification for each bit position (1st bit) 110e, it is stored in key position 130c, which is the next storage position in sorted key string 130. Also, a group of key “1111” and key “1110” is obtained as the value-1-at-bit-1 group 121e, and a classification based on the value at the 2nd bit is done in the classification for each bit position (2nd bit) 110f. 
Because 2nd bits of both key “1111” and key “1110” are both 1 in the classification for each bit position (2nd bit) 110f, only the value-1-at-bit-2 group 121f is obtained and the same keys as in value-1-at-bit-1 group 121e are subject to a classification at the 3rd bit, and the bit strings are classified at the 3rd bit using classification for each bit position (3rd bit) 110g. Because a single key “1110” is obtained as the value-0-at-bit-3 group 120g in the classification for each bit position (3rd bit) 110g, it is stored in key position 130d, which is the next storage position in sorted key string 130. In the same way, because a single key 1111 is obtained as the value-1-at-bit-3 group 121g, it is stored in key position 130e, which is the next storage position in sorted key string 130.
By means of the above processing the keys in key string 100 are sorted and stored in the key positions 130a to 130e in sorted key string 130. However, if the sort is executed by the above noted radix sort method ineffective processing that does not result in a classification occurs as can be seen in the classification for each bit position (1st bit) 110b and classification for each bit position (2nd bit) 110f shown in FIG. 1.    Patent document 1: JP 2002-116907 A    Patent document 2: JP 2005-316663 A