Conventionally, a technology referred to as geohashing is known as a technology for converting coordinate data into codes (text data). First, in performing geohashing, values indicating latitude and longitude are input. Then, bit strings corresponding to the input values of latitude and longitude are obtained. Then, 1 bit is extracted from a bit string corresponding to longitude and 1 bit is extracted from a bit string corresponding to latitude, alternately. Then, by alternately combining the bits extracted from the bit strings corresponding to longitude and latitude, a composite bit string is obtained. Then, codes are derived by applying a predetermined table to the composite bit string. The geohash technology is suitably used for, for example, computer processing or storing location point information in a database by allocating the derived codes to location points.
Next, an example of a process performed in each stage of a geohash operation is described. Bit strings are obtained from latitude and longitude by using the following method. It is to be noted that, in the following, parentheses “(” and “)” represent an open section which indicate that a value is not included, and brackets “[” and “]” represent a closed section which indicate that a value is included. In this example, it is a premise that the input value of latitude assumes a value of [−90 to 90], and the input value of longitude assumes a value of [−180 to 180). Accordingly, in this example, the range of the input value of latitude includes −180 but does not include 180. In the following, in a case where a section is subdivided into upper and lower sections, it is assumed that the upper section is the open section, so that a median value can be located within either one of the sections.
For example, in a case where the input value of latitude is located on the right side with respect to a medium value of 0 in a section of [−90 to 90] (i.e. located in a section [0 to 90]), 1 is input as the highest bit to a storage device. In a case where the input value of latitude is located on the left side with respect to the medium value in the section of [−90 to 90] (i.e. located in a section [−90 to 0)), 0 is input as the highest bit to the storage device. Then, in the case where the input value of latitude is located on the right side with respect to the medium value of 0, the section of the right side (i.e. section [0 to 90]) is further subdivided. Accordingly, in a case where the input value of latitude is located on the right side with respect to a medium value of 45 in the section [0 to 90] (i.e. located in section [45 to 90]), 1 is input as the second bit to the storage device. In the case where the input value of latitude is located on the left side with respect to a medium value of 45, 0 is input as the second bit to the storage device. These processes are repeated until a sufficient accuracy can be attained.
FIG. 1 is a schematic diagram for describing an example of generating a bit string pertaining to latitude (hereinafter also referred to as “latitude bit string”) by using geohash technology. In the example of FIG. 1, it is assumed that the input value of latitude is 42.6. In determining the location of the input value with respect to the median value for the first time, 1 is input to the storage device because the input value is located on the right side of the median value of 0 in the section of [−90 to 90] as illustrated in FIG. 1. In determining the location of the input value with respect to the median value for the second time, 0 is input to the storage device because the input value is located on the left side of the median value of 45 in the section of [0 to 90]. Likewise, in determining the location of the input value with respect to the median value for the third time, 1 is input to the storage device because the input value is located on the right side of the median value of 22.5 in the section of [0 to 45]. In determining the location of the input value with respect to the median value for the fourth time, 1 is input to the storage device because the input value is located on the right side of the median value of 33.75 in the section of [22.5 to 45]. In determining the location of the input value with respect to the median value for the fifth time, 1 is input to the storage device because the input value is located on the right side of the median value of 39.38 in the section of [33.75 to 45]. By recursively dividing the sections (i.e. repeating subdivision of sections), a latitude bit string of “10111” is obtained.
Similarly, in a case where the input value of longitude is −5.6, a bit string pertaining to longitude (hereinafter also referred to as “longitude bit string”) of “01111” is obtained.
After obtaining the latitude bit string and the longitude bit string, a composite bit string is generated by alternately arranging the bits of the latitude bit string and the bits of the longitude bit string one after another.
In this example, the bits of the latitude bit string and the bits of the longitude bit string are stored in an order starting from the highest bit of the longitude bit string “0” and followed by the highest bit of the latitude bit string “1”, the second highest bit of the longitude bit string “1”, the second highest bit of the latitude bit string “0”, the third highest bit of the longitude bit string “1”, . . . the sixth highest bit of the longitude bit string “1”. As a result, a composite bit string in this example is “01101111111”. FIG. 2 is a schematic diagram illustrating an example of obtaining a composite bit string obtained by combining a longitude bit string and a latitude bit string.
After obtaining a composite bit string, the composite bit string is converted into text data (codes) by using a conversion method such as base 32. FIG. 3 is a schematic diagram illustrating a base 32 conversion table. With base 32, a bit value equivalent to 5 bits is associated to a numeral or an alphabet letter as illustrated in FIG. 3. With the above-described composite bit string “01101111111”, “01101” corresponds to “e” and “11111” corresponds to “z”. Accordingly, the composite bit string “01101111111” is converted into code “ez”. The Geohash code can be expressed having a given accuracy. Geohash code has a characteristic in which its accuracy increases as the length of the Geohash code becomes longer and a characteristic in which its accuracy gradually decreases the more a tail end part of the Geohash code is omitted from the Geohash code.
For example, a code “u4pruydqqvj” can be derived from a set of coordinates {latitude: 57.64911, longitude: 10.40744}. In the current Internet, there is an available service that allows a corresponding location point of a map to be displayed by accessing a URL having a code added after “http://geohash.org/”. By converting latitude coordinates and longitude coordinates into codes, the length of data can be shortened. Thereby, transmission/reception of data or storage of data can be easily performed.
In order to obtain coordinates from a code, a process can be performed backwards relative to the process described with FIG. 1. For example, first, data “0110111111” is obtained by performing inverse transformation on the above-described code “ez”. Then, by reading the obtained data bit-by-bit, “01111” is recognized as a latitude bit string and “10111” is recognized as a longitude bit string.
It is to be noted that there are also other technology besides geohashing that can convert coordinates into codes (see, for example, U.S. Publication No. 2005/0023524).
Although geohashing technology satisfies an aspect of uniqueness in which codes and bit strings correspond to coordinates on a one on one basis, geohashing technology has a problem of not satisfying an aspect of distance retention. The aspect of distance retention is a property in which points (codes) that are located nearby are always indicated with bit strings that are close (similar) to each other. Thus, to not satisfy distance retention indicates being unable to consistently retain a distance between two points regardless of how the distance pertaining to codes or bit strings allocated to two points is defined. In a case where a target region is divided into blocks in correspondence with the accuracy of the code, a map or a set of coordinates can be expressed by identifying the block to which the coordinates belong. Thus, the term “block distance” indicates by how many blocks two points are separated. With this kind of coding technology, a block can be regarded substantially as an equivalent of a set of coordinates.
FIG. 4 is a schematic diagram for describing a block distance RD where geohashing technology is used. FIG. 4 is also for describing that geohashing technology is unable to satisfy the aspect of distance retention. In the example illustrated in FIG. 4, a code is expressed having an accuracy of 3 bits in a longitude direction and 2 bits in the latitude direction. In order to perform inverse transformation, a composite bit string is expressed in two columns having an upper column to which longitude data is assigned and a lower column to which latitude data is assigned. For example, as illustrated in FIG. 4, the number of different bit values of corresponding composite bit strings assigned to each one of two blocks (indicated with Hamming distance (HD)) may not always match the block distance between the two blocks. Thus, as a result, a pseudo-distance of the codes between the two blocks (e.g., “1” in a case of r and s, “3” in a case of r and u) do not indicate the block distance RD. Particularly, in a case where geohashing technology is used on the earth's surface, it is known that the relationship between the Hamming distance HD and the block distance RD significantly changes at the vicinity of, for example, the equator, the prime meridian, and the date line.
Without the aspect of distance retention, a block distance cannot be immediately derived even if codes or composite bit strings of two points (equivalent to two blocks) are provided. As a result, it would become necessary to perform inverse calculation of coordinates by using the provided codes or composite bit strings and then calculate the distance by using another arithmetic expression. Such calculation increases workload and delays process time.