1. Field of the Invention
The present invention relates to a multidimensional data management method, a multidimensional data management apparatus and a medium onto which is stored a multidimensional data management program, that are suited for the management of sets of multidimensional data whose positions are determined in a multidimensional space with two dimensions or more. To be more specific, it relates to a multidimensional data management apparatus and a multidimensional data management method, and a medium onto which is stored a multidimensional data management program with which various types of processing, such as registration, deletion, modification, search (for instance, range search, nearest neighbor value search) can be executed individually or in appropriate combinations by making the individual attributes of graphic data on a planar surface or in a three-dimensional space or statistical data with a plurality of attributes correspond to individual dimensions.
2. Description of the Related Art
A great number of methods have been proposed for managing sets of data with a plurality of attributes by employing a computer. For instance, in the areas of image processing, computer vision, drawing management and the like, tasks such as registration and deletion of data, search for a desired drawing for modification and the like must be performed by managing an enormous number of drawings represented by a great number of graphic elements such as vectors, points and symbols with a computer. Methods for such data management in the known art include the fixed grid method. In the fixed grid method, space is divided into fixed areas and sets of data are managed by individual areas. As long as the data handled is unchanging, the fixed grid method has advantages in that space search can be performed at high speed, and in that a high degree of memory efficiency can be achieved. However, with data such as data of facilities, for instance, in which additions and deletions are made frequently, the number of sets of data in each of the divided areas varies, since the division of the areas is fixed. Because of this, it is necessary to perform a search as other areas even in areas where there are no sets of data and to prepare buffer areas, even when no sets of data are present, which results in a reduced search speed and poor memory efficiency.
In order to solve the problems of the fixed grid method discussed above and to achieve efficient management and high speed search of multidimensional data, a hierarchical data structure based upon space division has been attracting much interest and a number of proposals have been put forward. For instance, the graphic data management apparatus disclosed in Japanese Unexamined Patent Publication No. 3.sub.-- 48974 achieves an improvement in data search speed when the display area is changed, by dividing and structuring great quantities of graphic positional information and graphic attributes in a multidimensional space through adoption of the theory of the MD tree with tri-tree structure. In addition, as a method comparable to this, the quad-tree method employing a 2.sup.n -divided tree for two dimensional data is also known.
(1) MD Tree Method
In the MD tree method, the tree comprises two types of nodes, i.e., leaves L at the extremities and internal nodes I, as well as pointers. The space where the sets of data are placed is divided recursively and each node is made to correspond to an individual divided space. In other words, a parent node corresponds to the smallest (n-dimensional) rectangle that contains all the spaces corresponding to its child nodes. In addition, all the sets of data are stored in the leaves L at the extremities. The internal nodes are respectively set to have two or three child nodes except for during the initial state, and if the number of child nodes exceeds three, the internal node is divided so that this requirements are satisfied at all times.
The number of sets of data that can be stored in each leaf is referred to as the data capacity P, and when the P+1 set of data is assigned to a given leaf, the space corresponding to this node is divided. For instance, as shown in FIG. 43, assume that P=3 and initially sets of data No.1 to No.3 are arranged within the space as shown in FIG. 43(A). If a new set of data 4 is put in this state, the space is divided into two spaces, i.e., a node A and a node B by the first dividing axis X1. The space divided in this manner corresponds to the tree structure shown in FIG. 43(B) and the individual leaves corresponding to the spaces resulting from the division store two sets of data respectively, i.e., data sets 1 and 2, and data sets 3 and 4.
If new sets of data 5, 6 and 7 are put in this state, since the node A comes to include 5 child nodes, i.e., the child nodes 1,2, 5, 6 and 7, it becomes further divided into a node C and a node D by the second dividing axis Y1. The spaces thus divided correspond to the tree structure shown in FIG. 43(C), and the leaves corresponding to the individual spaces store data sets (1, 5), (2, 6, 7) and (3, 4) respectively. When an area is divided in this manner, in the tree structure new leaves are created by dividing the leaf which corresponds to the divided area, and child nodes corresponding to these leaves are added.
This makes it possible to put new sets of data because of the increase in the number of leaves. Furthermore, it becomes possible to assign data set 8 as well. However, since three sets of data are already contained by the leaf (node D) which corresponds to the area where the set of data 8 may put, there is no more room for putting other data. Thus, the area to put the set of data 8 is divided into a node E and a node F by the third dividing axis X2 and the new leaves corresponding to them are created. However, since one parent node is set to have only three child nodes in the division using the MD tree, it is necessary to divide the internal node itself, as shown in FIG. 43(D) if new leaves are to be created by dividing the leaf corresponding to the node D in FIG. 43(C).
The MD tree with such tree structure has an advantage in that, since the distances between the root of the tree (ROOT) and all the leaves are set equal regardless of the distribution status of the data or the order in which the sets of data are put through division of the internal nodes, a well balanced tree structure is achieved, resulting in a high degree of search efficiency. In addition, since (P+1)/2 sets of data are always stored in each leaf, an improvement in memory utilization efficiency is achieved in comparison to the fixed grid method.
(2) Quad-tree Method
In the quad-tree method, an area is divided into four equal portions by two axes that are parallel to the individual axes of a two dimensional plane. Division is performed recursively until the number of sets of data included in every divided area is equal to or smaller than P, and this process is stored in memory as a quad-tree. The quad-tree method is described in, for instance, ACM Computing Surveys, Vol. 20, No. 4, December 1988.
In this division method, since only new child nodes corresponding to new leaves into which a leaf is divided are created, it is not necessary to create a new ROOT when dividing an internal node, as in the MD tree method, facilitating management of the tree structure. In addition, since an area is divided into equal portions, it is easy to specify the division positions. There is no the necessity for storing in memory the individual dividing axes, as is required in the MD tree method.
In the quad-tree method, since every area is divided into equal areas, disregarding data distribution, if the data distribution is not uniform, the balance of the tree and the memory utilization efficiency become extremely poor.
(3) Range Search, Nearest Neighbor Data Search
Range search and nearest neighbor data search must be performed in data management methods such as the MD tree method and the quad-tree method, whereby sets of data are managed by dividing space hierarchically, as in the conventional data management systems. That is, in range search, sets of data belonging to a specific value range are searched. In nearest neighbor value search, on the other hand, a set of data with a value that is nearest to a specific value such as the central point of a graphic, the cursor, an average value or a central value of different sets of data, are searched.
FIG. 44 shows an example of nearest neighbor search. In this example, on an apparatus that manages sets of two dimensional graphic data, the graphic element that is nearest to the cursor is displayed in a different color at all times to facilitate the specification of graphic elements which are registered in the apparatus. In this example, the leaf corresponding to the area containing the point Q which, in turn, corresponds to the cursor position, is determined, as shown in FIG. 44, and the set of data 1 that is nearest to the point Q in the leaf is determined. Next, the data contained within the circle whose center is the point Q and whose radius is the distance D1 between the point Q and the set of data 1 are searched in another leaf by performing a range search to obtain a data set 3, whose distance D2 from the point Q is smaller. Subsequently, the radius of the specified circle is changed to D2 to continue with the search.
3. Problems the Present Invention Addresses
(1) Problems of the Division Methods
In the MD tree method described above, the space may be divided through leaf division by cyclically selecting dividing axes as selecting them in an X-Y-X-Y . . . cycle, for instance, or by selecting a dividing axis in correspondence to the length of a side of the rectangle. In any case, they are all based upon variable area (volume) division, in which division is performed in conformance to the quantities of data. However, since the positions of the dividing axes are not fixed in such variable area division, it requires a great deal of time to calculate the division position during division, and it is also necessary to store in memory the coordinates of the individual dividing axes, causing an increase in required storage area. Moreover, in order to achieve a completely balanced tree, it is necessary to create anew the node that is to constitute the ROOT by working backward from the new node in addition to the nodes corresponding to the leaves being divided when an internal node is divided. Thus, compared to other methods in which only the nodes corresponding to the leaves being divided must be created, the management of the tree structure becomes more complex.
(2) Problems Caused by Data Registration Only at Leaves at the Extremities
Furthermore, in the past technologies such as the MD tree method and the quad-tree method, since sets of data are stored only in the individual leaves present at the extremities of the tree structure, if a set of data that is put into the area lies astride other areas as well, the number of sets of data that are registered more than once in a plurality of areas increases, resulting in an increase in the number of sets of data that are judged to be present within the individual areas, ultimately causing an increase in the number of divisions that must be performed in the individual areas. For instance, when a line segment L is present over areas B, C and D, as shown in FIG. 45, this line segment L is registered in all the leaves that correspond to the individual areas B, C and D. However, the number of sets of data that can be registered in a given leaf is set at P in advance. The number of divisions of those areas will increase by the number of excess sets of data put into the areas because of these overlaps, resulting in an increase in the number of internal nodes and leaves corresponding to the divided areas. As a result, problems arise in that the tree structure becomes more complex, in that the memory requirements increase and in that the search efficiency is reduced. In particular, when a great number of sets of graphic data with various lengths, areas, volumes and the like are to be managed, as in construction design, facility management and drawing management, if all the sets of data which belong to respective areas are registered in the leaves at the extremities as in the past and the areas are divided based upon the number of sets of data, almost all the graphic data are registered reduplicatively more than once, reducing the memory utilization efficiency to an extreme degree.
Furthermore, in the past technology described above, the criterion for dividing a leaf at the extremities is based upon whether or not the number of sets of data to be registered in a given leaf exceeds the preset value P. Because of this, if there are many multiple registrations, even with a leaf at the extremities divided, the number of sets of data registered in a new divided leaf is reduced very little in comparison to the number of sets of data registered in the leaf before the division. As a result, it becomes necessary to repeat division, posing a problem in that division is performed infinitely up to the limit of the storage area.
In addition, while, in the MD tree method, since division of areas is performed to ensure that the number of sets of data in each area is equal, division of the graphic data by the dividing axis may sometimes be avoided. In the quad-tree method, since division is performed into equal areas disregarding the data distribution, the dividing axes cannot be positioned to avoid unnecessary division of the graphic data. Because of this, graphic data of a certain size will be divided more frequently, further increasing the likelihood of multiple registrations.
(3) Registration into Intermediate Nodes
In order to improve the problem discussed above, the following solutions are adopted in the quad-tree method. First, in the solution detailed in FIG. 19 on page 299 of the publication mentioned earlier, i.e., ACM Computing Surveys, Vol. 20, No. 4, December 1988, multiple registrations are avoided by registering graphics that lie astride two or more areas at an internal node when managing the graphic data sets A-G shown in FIG. 46 with a quad-tree. In other words, since the graphic data sets A and E lie astride a plurality of areas among the areas achieved by dividing the entire area (the entire world) into quadrants, they are registered at a parent node 1, which corresponds to the entire area containing all these areas. Likewise, the graphic data sets B, C and D lying over a plurality of areas correspond to a node 3 which contains all those areas and the graphic data set G corresponds to a node 5. A graphic data set F, which does not lie astride a plurality of areas is registered at a node 15, which corresponds to one of the areas that are achieved by division into the smallest areas.
It is true that there is an advantage in this past technology in that multiple registration of data lying astride a plurality of areas is completely eliminated. However, in this method, even data such as the graphic data set H with a very small portion lying astride a plurality of areas are registered at a node corresponding to a higher-order area containing the plurality of areas. As a result, the number of sets of data that are registered in higher-order areas increases, resulting in a larger range for data search and reduced search efficiency. Search can be performed at a higher speed when searching over a plurality of areas over a smaller range even though the number of areas to be searched is large because of the multiple registrations than when searching the entire higher-order area over a vast range.
In order to solve this problem posed by the solution for completely avoiding multiple registrations, a solution whereby registrations are made at internal nodes while multiple registrations are also possible is proposed in FIG. 21 on page 301 of the publication mentioned earlier. This solution is explained in reference to FIG. 47.
For instance, since the smallest area containing the graphic data set G belongs to an area 5# in the lower right corner achieved by dividing the entire area into four areas, by dividing this area into four portions the graphic data set G is divided into two sets of graphic data sets G1 and G2 at their boundary X1. Then, the smallest areas 10# and 11# containing the individual divided sets of graphic data sets G1 and G2 respectively are determined. In this case, if the area division is carried out down to a size equaling the size of the area 14# containing the graphic data F, the divided portions G1 and G2 of the graphic data set G will be further divided, and this does not satisfy the requirement that "after determining the smallest area containing graphic data, division of the graphic data is performed only once." Because of this, the graphic data set G are registered at nodes 10 and 11 corresponding to the areas 10# and 11#.
The node 11 is an internal node and by registering the set of data at this internal node 11, the search speed can be improved compared to a case in which the set of data is registered so as to overlap at the nodes 15 and 16 at the extremities, which are child nodes of the internal node 11. At the same time, by registering the graphic data set G at both of the two nodes where the graphic data set G are present, i.e., at the nodes 10 and 11, it becomes possible to limit the search range to the nodes 10 and 11 corresponding to the areas 10# and 11#, which are smaller than the smallest area 5#, which includes the graphic data set G.
However, since the requirement that the division of a graphic be performed only once is imposed in this past technology, while there is an advantage in that there can be only one multiple registration for each set of data, it is not suited for processing sets of data that should be described and registered, even with a large number of overlaps, such as graphics that span a great length, including pipelines, power transmission lines, traffic paths and so forth.
(4) Problems During Nearest Neighbor Data Search
In the past technologies described above, when performing nearest neighbor data search, the leaf containing a specified point Q is determined, the set of data that is nearest to the point Q within this leaf are determined and a range search is performed based upon the distance between the set of data and the specified point Q. It is possible to employ this method when there are always some sets of data present inside each leaf, as in the MD tree method. However, no sets of data may be present within some of equally divided areas, as in the quad-tree method, and in such a case, it is not possible to perform nearest neighbor data search using the system described above.
(5) Objects of the Invention
An object of the present invention, which has been completed in order to solve the problems of the past described above, is to achieve efficient utilization of memory and higher speed search processing by performing decision making as to whether or not a leaf at the extremity is to be divided so that division is not performed if further division does not achieve any advantage in data management.
Another object of the present invention is to improve the speed at which sets of data lying astride a plurality of divided areas are searched by registering them at internal nodes as well and also by making multiple registrations under specific conditions.
Yet another object of the present invention is to enable fast search of sets of data belonging to a specified range that contains one or more divided areas.
Yet another object of the present invention is to perform fast search of nearest neighbor data inside other areas when searching a set of data positioned nearest to a specified point even if no sets of data are present within the divided area that contains the specified point.
Yet another object of the present invention is to facilitate area division and area merging by managing area divisions with a tree that is based upon division by 2.sup.n and by dividing so that each area will be of equal size (equal volume), and to reduce the required memory area by dispensing with storage in memory of the division positions.
Yet another object of the present invention is to facilitate area divisions by managing area divisions with a tree based upon division by 2.sup.n and by dividing the individual areas in conformance to specific division rules (including random division within a limited range) regardless of the number of sets of data in the individual areas.