Conventionally, data analysis of input data is carried out by approximation using a regression equation, etc. For example, when data analysis is carried out to analyze the cause of a failure of a device, an explanatory variable is given as a candidate for the cause of the failure, and an objective variable is given as a variable indicative of whether the device is normal or is failing. According to this data analysis, a multi-explanatory-variable regression equation for calculating a predicted value for the objective variable based on explanatory variables is created. Based on the magnitude of coefficients of the multi-explanatory-variable regression equation, multiple candidates for the causes of the failure are analyzed to determine which candidate is related to the normal operation or failure of the device. For examples, refer to Japanese Laid-Open Patent Publication Nos. 2004-152205, 2004-234302, and 2007-293889.
In an analysis of a subject, values measured to be positive cases (or negative cases) are greater in number than values measured to be negative cases (or positive cases). For example, in the above data analysis of the cause of failure of the device, the positive (normal operation) cases are greater in number.
According to the above conventional technique, however, a prediction equation, such as a regression equation, is generated on the assumption that values measured to be positive cases (normal cases) and values measured to be negative cases (failure cases) are approximately the same in number. This poses a problem that if values measured for positive cases are overwhelmingly greater in number than values measured for negative cases, the prediction accuracy of the prediction equation drops. The same problem occurs when values measured for negative cases are overwhelmingly greater in number than values measured for negative cases.
According to the conventional technique, the prediction equation is generated using all of the data to be subject to evaluation; and all of the data to be subject to analysis, from which the prediction equation is originally generated, is evaluated using the prediction equation. As a result, determining a predicted value from the already known data that is subject to evaluation gives a predicted value that is the known evaluation subject data. This poses a problem that the predicted value is improperly evaluated to be correct in an excessive manner.