Processing for analyzing a large amount of data of a plurality of types are being performed on a wide variety of systems today. For example, combinations of data of types that are highly related to one another are extracted from among data of a plurality of types, and the extracted combinations of data are used to perform statistical processing or prediction processing. If data to be analyzed contains data having different characteristics, the accuracy of the analytical processing will be decreased or the analysis will be impossible.
Consider, for example, analysis of the relationship between an input packet rate and a central processing unit (CPU) utilization rate in a computer system by using an approximate line obtained by a least-square method or the like. If the computer system performs operations that are different between day and night, such as performing business processing during the daytime and batch processing during the night-time, there will be a significant difference in CPU utilization rate with respect to the input packet rate between day and night. In this case, an approximate line obtained from the mixture of daytime data and nighttime data is likely to be unfit for actual operation of the system.
Such analytical processing therefore requires classification (clustering) of data to be analyzed into clusters each of which includes data having the same characteristic by taking into consideration the characteristic of the data in advance.
A technique relating to such clustering in analytical processing is disclosed in PTL1, for example, which is a capacity management support apparatus that calculates a distribution density function for data combinations of particular types to classify data to be analyzed. NPL1 also discloses a technique that uses cross validation or Bayesian estimation to extract combinations that are in a close relation among data of a plurality of types and classify the data.
A related technique is disclosed in PTL2 which is an operation management apparatus that predicts an item of performance information concerning a system from another item of performance information on the basis of a correlation model of the system. PTL3 discloses another related technique which is an image data classifying apparatus that classifies image data on the basis of a plurality of types of distance definitions.