In many application domains of computer vision involving facial image analysis, such as pain detection, affective computing, etc., it is useful to determine action units.
The emergence of robust face detection algorithms in early 2000s accelerated research on automatic analysis of faces recorded in images and videos. Automatic analysis of facial expressions is one of the research fields that received increased attention since then. Research in this field is pursued mainly in two directions: one of the directions focuses on an objective analysis of basic facial movements based on the Facial Action Coding System (FACS) [1]. The other focuses on detecting a set of prototypical facial expressions of emotions.
Since more than ten years, systems such as the Sophisticated High-speed Object Recognition Engine SHORE™ have been developed. SHORE™ [2] is a general framework for various object detection tasks in images and videos, with a focus on face detection and analysis. SHORE™ detects and tracks faces in real-time, estimates age and gender, and identifies four basic expressions of emotions, namely happiness, sadness, anger and surprise.
Several attempts have been made in the field of AU detection and emotion recognition. Geometric features have been computed using the location of facial landmarks defined according to a deformable face model. Typical approaches for landmark localization include Active Appearance Models (AAM) and Constrained Local Model Fitting (CLM). Texture features encode visual texture information using, for example, histograms of oriented gradients (HOG), histograms of local binary patterns (LBP) or histograms of local Gabor binary patterns (LGBP).
According to an embodiment of conventional technology, a variant of artificial neural networks is used [3]. According to another embodiment, an SVM (support vector machine) fuses scores from classifiers [4]. Multiple kernel learning based fusion approaches use separate kernels for geometric and texture features. According to an embodiment of conventional technology [5], a multi-kernel SVM is used for feature fusion. A Gaussian kernel is used for geometric and gradient based texture features, and an intersection kernel is used for higher dimensional Gabor-based texture features. A multi-kernel SVM is also used in another teaching for feature fusion [6], where kernels of the same type are applied to geometric and texture features. For SVM, see also https://www.csie.ntu.edu.tw/˜cjlin/libsvm/.
It is intended to perform a technique which permits to obtain more accurate information on facial expressions.