In recent years, studies of estimating attributes (sex, age, expression, etc.) of a person based on a face image of the person have developed significantly. Among others, estimation of sex and age is used for applications, such as marketing strategies, security, and amusement, and is commercialized.
For example, there is known “supervised learning” as a technique of machine learning (for example, refer to Patent Literature 1). In supervised learning, a case data set containing combinations each comprising input data (observed data) and output data (implication, attribute, or result of observed data) is regarded as “advice from a supervisor”, and a machine (computer) learns based on the case data set. The phrase “learning” in this context means creating a function model for predicting or estimating output for input data whose output is unknown.
Next, a specific description will be given taking face image recognition as an example. In this face image recognition, a description is given of a case in which sex (one of human attributes) is estimated based on a face image.
At the time of learning, a computer constructs a function model based on a case data set containing face images of females and males. At the time of evaluation, when a face image (for example, female face image) whose sex is unknown is supplied, the computer produces “female” as its sex based on the input data and the function model.
Further, in Patent Literature 2, there is disclosed an “age estimation device, method, and program,” which are capable of obtaining a recognized result close to a result that a human being perceives. In the age estimation device disclosed in Patent Literature 2, in creating a model for age estimation by regression analysis, a learning weight for young adults is increased to improve estimation accuracy for young adults. Specifically, in Patent Literature 2, in a supervised regression problem of predicting a true age of test data, which is an extraction source of a feature vector, kernel regularized weighted least squares (KRWLS) is used to model an age estimation function by a linear combination of positive definite kernels.
There is also known a classifier with high learning efficiency called a least square probabilistic classifier (LSPC) (for example, refer to Non Patent Literature 1 and Non Patent Literature 2). The LSPC is a classification technique in which a posterior probability model of classes is learned under a squared loss. The greatest feature of the LSPC is being capable of calculating the solution analytically. Further, in the LSPC, a posterior probability is directly estimated in a form of a density ratio for each class, and hence also has the feature of being robust to unbalance of the number of pieces of learning data of the respective classes. In the LSPC, the posterior probability is learned using the squared loss. In this manner, with the LSPC, learning time can be reduced by several hundred times while maintaining pattern recognition accuracy at the same level of related-art techniques. Further, the LSPC is less susceptible to a deviation in number of pieces of data of a particular class.
Further, ranking learning is also known. In this context, the “ranking learning” is an optimization technique in the framework of supervised learning so that data can be given a high score depending on the degree and order of relevance. For example, as a representative example of the ranking learning based on a pairwise approach, a ranking support vector machine (ranking SVM) is known (for example, refer to Non Patent Literature 3). In the ranking SVM, a loss for a learning data pair is taken into consideration to reduce to a two-class classification problem of the SVM, to thereby optimize a score function.
As a method of calculating a magnitude of a correlation between an explanatory variable representing a feature quantity of an object and an objective variable representing an attribute or a result, there are known, for example, a method of calculating a correlation value in a sub-space (one-dimension) of canonical correlation analysis (CCA), maximum likelihood mutual information (MLMI), which is a method of calculating mutual information (MI) (for example, refer to Non Patent Literature 4), or least-squares mutual information (LSMI), which is a method of calculating squared-loss mutual information (SMI) (for example, refer to Non Patent Literature 5).
There is also known a technique of optimizing a sparse regression model (for example, refer to Non Patent Literature 6 and Non Patent Literature 7).