Statistical machine learning technique is used in various fields. In statistical machine learning, by using training data including a plurality of samples each of which is assigned a correct class that is contents indicated by the sample, learning of statistical characteristics of the leaning data and the class is performed based on a model. The training data are collected beforehand in advance of the learning. Then, by applying the model to test data including a plurality of samples each of which is not assigned the above-mentioned correct class, results of prediction, recognition or other results with respect to the test data are acquired.
Pattern recognition technique is one of the fields in which machine learning is used. In the pattern recognition technique, a class to which an input pattern belongs is estimated. An example of the pattern recognition technique includes object recognition which is technique for estimating an object included in an image, voice recognition which is technique for estimating contents of utterance, or the like.
It is assumed in most of the machine learning method that statistical characteristics of the training data and statistical characteristics of the test data are coincident each other. In other words, in the case that the above-mentioned two characteristics are different each other, there is a possibility that precision of the machine learning may be deteriorated. Therefore, technique which is called domain adaptation for cancelling the difference between the above-mentioned two characteristics is proposed.
A patent literature (PTL) 1 describes a learning device and the like. The learning device described in PTL 1, performs learning of a prediction model which is used for predicting an output of test data based on importance that is a ratio of generation probability of training data, which are input data of training sample data, and the test data.
Moreover, a non-patent literature (NPL) 1 describes an technique of performing feature transformation so that training data and test data may have the similar distribution. According to the technique which is described in NPL 1, projection to a subspace group, which is formed by interpolation between subspace where the training data are distributed and subspace where the test data are distributed, is used as the feature transformation.