1. Field of the Invention
This invention relates to a data management system, a data management method, and a computer-readable medium having stored therein a data management program suitable for managing data such as coordinate data representing points in two-or-more-dimensional spaces.
More particularly, this invention relates to a data management system, a data management method, and a computer-readable medium having stored therein a data management program capable of performing one or more of operations, such as registration, deletion, change, and search (range search, nearest data search, and so forth), on coordinate data on a plane or in a space.
2. Description of the Related art
Many methods for managing multiple-attribute data with the use of computers have been proposed. For example, image processing, computer vision, or drawing management requires computers to manage a huge amount of drawings represented by a large volume of vectors, points, symbols, and so forth. Computers are used to add or delete data, search for desired drawings, and change data.
One of known data management methods is a quad-tree method.
The quad-tree method divides an area into four with the planes each parallel to the two-dimensional plane axes. The method repeats the division recursively until the number of data pieces included in one subarea has become P or less. It then stores the data as a quad-tree. This quad-tree is described, for example, in ACM Computing Surveys, Vol. 20, No. 4, of December, 1988.
A tree generated by this division method is easy to manage because its operation is simple, that is, it only generates child nodes each time it divided a leaf Another advantage of this method is that, because the area is divided into equal-sized sub-areas, the user can identify each sub-area easily. This eliminates the need for the user to memorize each division axis.
On the other hand, this method does not take into account the distribution of data while dividing the area into equal-sized sub-areas. This means that, when data is distributed unevenly, this method generates an unbalanced tree and wastes memory.
The following describes how the quad-tree method stores data. Each sub-area shown in FIG. 13(A) corresponds to a node with the corresponding number shown in FIG. 13(B). Thus, the shaded sub-areas in FIG. 13(A) correspond to the black nodes in FIG. 13(B). FIG. 13 was reproduced from FIG. 3 on page 274 of the above-described document.
However, the enumeration of all the shaded areas (or data included in those areas) represented by a data structure, such as that shown in FIG. 13(A), requires the user to traverse all the sub-trees.
A tree-structured index, used to store data in a conventional data management method, such as the quad-tree method, allows the user to identify point data in a sub-area as an internal node and/or an external node in a sub-tree whose root is a node. However, enumeration of point data in a node requires the user to traverse all sub-trees.
In this type of tree structured index, a sub-area containing many point data pieces usually corresponds to a large sub-tree. Therefore, even if a large sub-tree containing many point data pieces, it is difficult to reduce the time required for traversing to enumerate one point data piece. Rather, efficient traverse of a large sub-tree requires more memory areas for stacks and flags (provided for each node to represent the status of the traverse operation on that node).
As described above, the conventional data structures require extra memory areas and besides an amount of memory occupied by the coordinate data to be retrieved. In addition, enumeration of data found as that included in a sub-range requires the user to traverse sub-trees until all the desired data are reached. This is inefficient.
Although the conventional data structures are generally efficient for processing such data as to be changed frequently, they are inefficient for processing such data as not to be changed or as to be changed rarely. In addition, although the conventional data structures are generally suitable for processing data limited in range and distribution, they cannot cope non-limited, dynamic range (with distributed precision) data. For example, when the x-axis scale unit differs from the y-axis scale unit, normalization is required. However, the conventional data structure has no specific normalization rule.
From the viewpoint described above, the conventional data management methods disclose neither any efficient access means for accessing data determined to be in a specific range, nor any efficient nearest-coordinate search means.
This invention seeks to solve the problems associated with the conventional data management methods described above. It is an object of this invention to provide a data management system, a data management method, and a computer-readable medium having stored therein a data management program which efficiently use memory areas and significantly increase search efficiency.
The above object has been achieved by the present invention. One aspect of the present invention is a data management system. The system comprises an index containing n-dimensional coordinate data and a search means for searching the index for the coordinate data. The coordinate data in the index is sorted into lexicographic order of area codes, each code generated by taking a prefix of a predetermined length from a coordinate of each dimension in turn and by concatenating the resulting prefixes in a predetermined order of dimensions.
Another aspect of the present invention is a data management method corresponding to the data management system described above. The method comprises the step of searching an index for n-dimensional coordinate data. The coordinate data in the index is the same as that in the system described above.
Further, another aspect of the present invention is a computer-readable medium having stored therein a data management program for managing n-dimensional data stored in an index, the program searching the index for the n-dimensional coordinate data. The program comprises a means for causing a computer to search the index for the n-dimensional coordinate data. The coordinate data in the index is the same as that in the system described above.
Such a structure of the coordinate data results in that the coordinate data is easily searched based on the area codes. This improves substantially the search efficiency.