1. Field of the Invention
The present invention relates to a spatial data analysis apparatus that analyzes two- or more-dimensional spatial data, particularly to a spatial data analysis apparatus that analyzes a place where events congest spatially and a condition to search for the place, and a method therefor.
2. Description of the Related Art
In data having two- or more-dimensional spatial coordinates such as GIS (Geographic Information System) data and map data as attribute, when each record of data is selected under a certain condition (except for condition regarding spatial coordinates), it is an important application in a spatial data analysis to find a condition that the selected records congest spatially.
FIG. 1 shows taxi riding data as an example which two-dimensional position information is included in each record of a database. The data is data representing a place and a time at which a taxi picked up a passenger, and weather at that time, data representing the place that picked up the passenger are shown by X and Y coordinates.
When the places represented by these data are plotted in an XY coordinate space, distribution of the places is provided as shown in FIG. 2. From this chart, a remarkable trend cannot be found. However, when only data to satisfy a condition except for spatial coordinates such as “before 12:00 in a fine day” are plotted in the XY coordinate space, the data are distributed as shown in FIG. 3. It is understand from FIG. 3 to show a tendency that data congest towards the upper part of FIG. 3. When this trend is used, a passenger is easy to pick up in the morning of a fine day on an area shown in the upper part of FIG. 3. Accordingly, the effective taxi allocation, such as the concentration of empty taxis on that area, can be achieved.
On the other hand, a technique for extracting knowledge that is hidden in a large amount of data obtained by analysis is known as a data mining technique. A decision tree generation method is known as a representative technique. A tree is created to have as a node a condition for classifying records in a database. A new record is applied from the root of the tree to classify the record. In the decision tree, a tree structure is created on the basis of data in a table format (called a training set). A plurality of attributes and one class are assigned to the data in the table format. Each attribute is used for classifying each record into one of the class. Each attribute may take a category value (categorical value) or continuous value.
According to the method of creating a decision tree, nodes are so generated as to optimally divide a training set from the root of the tree, and the training set is divided in accordance with this division. Nodes are then repeatedly generated to further optimally divide the divided training sets.
By the way, when the data mining is performed for information including spatial data by a decision tree generation technique based on a class classification, in other words, when information representing “corresponding data belongs to which spatial area when a certain condition (except for a condition regarding spatial coordinates) is designated” is subjected to the data mining by the decision tree generation method, it needs to preprocess the two- or more-dimensional spatial area to a one-dimensional class.
When spatial data is preprocessed and analyzed by the decision tree generation method based on the class classification, there are problems that precision of convention to be provided as a result becomes bad.
As described above, when spatial information is coded by preprocessing, and then analyzed, quantity of information is reduced in a stage of encoding the spatial information. For this reason, precision of data mining result was degraded. It is thought that the reason is because range of a place at which data are congested by preprocessing is fixed. Because the place where data are dense is looked for, class classification is performed by only the degree that the congestion is concluded whereby segmentation of the class is limited.