1. Field of the Invention
The present invention relates to a data division apparatus, data division method and program used to conduct data division (clustering) on a point set in an n-dimensional space.
2. Related Art
In recent years, a plant system is constructed so as to find an abnormality in the plant by monitoring proper ranges of sensors attached to individual apparatuses (objects to be measured) included in the plant system, in some cases. A proper range that the sensor value should assume is previously set, and an abnormality alarm is issued when the sensor value has got out of the proper range. As the number of sensors increases, automatization of the proper range setting is desired. For setting a proper range for a certain sensor (hereafter referred to as target sensor), at least one other sensor (hereafter referred to as explanatory sensor) can be used. A model for predicting the target sensor on the basis of the explanatory sensor is constructed. If its predicted value differs largely from the actual value, the possibility that the target sensor indicates an abnormal value is high.
The prediction model can be created by using time series data (multi-dimensional data) of the target sensor and the explanatory sensor collected in the past. In general, however, construction of this prediction model is not easy. It is because a value assumed by the target sensor is not determined uniquely by the value of the explanatory sensor, but it depends on the running situation of the plant as well. This situation will now be described by using an example of a sensor in a power plant.
It is now supposed that there is plot data (running history data) with its ordinate indicating a pressure of a pump output from the target sensor and its abscissa indicating a generated power output which is output from the explanatory sensor. The pump has an operation state and a non-operation state. It is supposed that in the operation state of the pump the pressure of the pump is in proportion to the generated power output and in the non-operation state of the pump the pressure of the pump assumes a low constant value. If a model for predicting the value of the target sensor on the basis of the explanatory sensor is generated by using, for example, regression analysis, without separating the above two operation situations, the error of the model becomes great. It is desirable to generate models respectively based on the operation situations of the pump. For doing so, it is necessary to separate a set of points in the running history data into a plurality of groups and generate models respectively for groups.
As techniques for grouping points on a plane or in a space, there are the k-means method and the agglomerative method. These techniques are described in Michael J A Berryand Gordon Linoff, “Data Mining Techniques”, Wiley Computer Publising, pp. 187-215.
In the k-means method, k initial points are selected previously, and each point of remaining points is regarded as belonging to the same group as a point among the k points closest to the point. A centroid is calculated for every group, and grouping is repeated again regarding centroids as k initial points. On the other hand, in the agglomerative method, a combination having the shortest distance among all combinations of points is regarded as one group. A centroid of grouped points is regarded as one point, and similar processing is repeated until all points belong to one group. Incidentally, as for other ways of measuring a distance, there are a method using a distance between closest points in groups and a method using a distance between farthest points.
In these techniques, basically close points are grouped and only a distance between points is considered. In these techniques, therefore, grouping properly reflecting the above-described state of the measurement subject, i.e., grouping reflecting the tendency other than the distance between points, which is immanent in the multi-dimensional data, for example, grouping close to human instinct, cannot be conducted.