Data mining is an art for finding out useful knowledge out of a large amount of data by analyzing the data. Business operators intend to further enhance added value of their businesses by using the knowledge which is acquired by data mining.
Some of typical examples of data mining application will be explained in the following. There is a case that data mining is used for predicting an unknown phenomenon based on known data. For example, weather forecast is one of the typical examples of applying data mining. In the case of the weather forecast, a data mining system predicts a future weather based on data on the past weather. Moreover, a data mining system predicts, for example, a medicinal effect of a compound based on data of the compound.
Here, ‘predictor’ will be explained in the following. According to a data mining system, ‘predictor’ is used, for example, when predicting an unknown phenomenon based on known data. The predictor is a function which takes a value of an explanation variable as input and outputs a prediction result. According to the data mining system, known data is inputted into the predictor as a value of the explanation variable. Consequently, the predictor outputs the prediction result. The prediction result which is outputted by the predictor is hereinafter expressed as ‘predicted value’ unless otherwise noted.
Whether the data mining system may appropriately predict the unknown phenomenon or not strongly depends on whether an appropriate predictor is used or not.
A business operator considers to apply the knowledge, which is acquired by data mining, to its business. In this case, the business operator is eager to confirm a reliability of the prediction result which is outputted by the data mining system. Then, to ‘evaluate’ the predictor is carried out by use of a computer.
In the case that the computer carries out ‘evaluation’ on the predictor, it is evaluated how appropriate prediction results (that is, predicted values) the predictor outputs for the value of the explanation variable inputted.
In the following explanation, a module used for the purpose of evaluating the predictor when the computer carries out a process of evaluating the predictor is called ‘evaluation module’.
One of method for evaluating whether the predictor is appropriate or not is to compare the predicted value which is outputted by the predictor and an observed value which corresponds to the predicted result. For example, a case when a data mining system carries out a weather forecast is considered. For example, it is assumed that, at a time of Jan. 1, 2014, the data mining system predicts by use of a certain predictor that the highest temperature of tomorrow (Jan. 2, 2014) is 10° C. (° C. represents temperature in Celsius). It is also assumed that the actual highest temperature of Jan. 2, 2014 is 11° C. The evaluation module evaluates the predictor, for example, at a time of Jan. 2, 2014, by comparing the predicted result (that is, 10° C.) and the observed value of the highest temperature (that is, 11° C.) of Jan. 2, 2014.
Here, it is assumed that the data mining system predicts, using a certain predictor, the highest temperature of each day for one year. The evaluation module compares, for example, the predicted value and the observed value corresponding to the predicted value for each of the data accumulated for one year. By comparing as mentioned above, the evaluation module may statistically analyze a degree of difference between the predicted value and the observed value. For example, the evaluation module calculates a mean value or a variance value of differences between the predicted values and the observed values which are accumulated for one year. Based on the statistical analysis mentioned above, the evaluation module evaluates the predictor.
NPL 1 discloses a programming language for statistical analysis, and a development and execution environment thereof. An art which NPL 1 discloses includes various functions which are used for evaluating the predictor.