In various business fields, predicting the future from obtainable data is effective for business improvement. For example, in a store, when future sales can be predicted from sales data during the most recent two weeks, inventory control can be appropriately carried out. When it can be predicted whether complaints come from customers and which manner of reception operations causes such complaints from records of reception operations at a call center, complaints can be reduced.
In the description of the present invention, a type of data used as a clue for prediction, such as sales data during the most recent two weeks and records of reception operations at a call center, is referred to as “explanatory variable”, and a variable to be predicted, such as future sales and occurrence/non-occurrence of incoming complaints, is referred to as “objective variable”. It is assumed that “prediction” is to create a function of explanatory variables and obtain a predicted value of the objective variable.
It is also assumed that past data are available as a clue for the prediction. Past data are a set of samples each of which is a tuple of explanatory variables and an objective variable. Hereinafter, the set of samples is referred to as “training data”.
Methods to carry out prediction by use of training data include a method using machine learning. Machine learning is to create a function to output a predicted value of the objective variable by using explanatory variables as input on the basis of training data.
However, there is a problem in applying machine learning. The problem is that machine learning is not applicable when an explanatory variable in the training data has a missing value. For example, when a specific item is out of stock during a certain period of time, the sale of the specific item becomes missing value, which makes machine learning inapplicable. When a portion of records of reception operations has missing values because an operator who has answered a call has missed recording his/her operation, machine learning also becomes inapplicable. That is, many methods using machine learning have a problem in that the methods are not applicable to data including missing values.
On the other hand, a method to impute missing values by a mean value and a method to impute missing values by predicting the missing value from other explanatory variables have been known. However, when a large error occurs in the imputation, these methods cause an unnecessary error in the prediction of the objective variable.
To solve such a problem, NPL 1, for example, discloses a prediction system that can carry out prediction even when a portion of training data, which are used as input, includes missing values for the explanatory variables. FIG. 11 is a block diagram illustrating an example of a conventional prediction system.
As illustrated in FIG. 11, a prediction system 20 includes a data partitioning means 21 and a prediction function learning means 22. When training data is input, the data partitioning means 21 partitions the input training data, and outputs the partitioned training data. When the partitioned training data is input, the prediction function learning means 22 carries out learning to create a prediction function for each partition of training data, and outputs the created prediction functions.
The conventional prediction system illustrated in FIG. 11 operates in the following manner. First, for respective samples in the input training data, the data partitioning means 21 refers to which explanatory variable has a missing value (hereinafter, referred to as “missing manner”), and gives the same label to samples that have the same missing manner.
Next, the prediction function learning means 22 is inputted the labeled training data output by the data partitioning means 21, carries out machine learning with respect to each label, using only a set of samples to which the same label is given as training data, and, consequently, outputs prediction functions.