Human interaction is critically dependent on mankind's ability to process faces, in terms of sex, age, emotion, ethnic origin, identity, and so on. Many applications exist, and several new ones can be conceived, which need to process, interpret, monitor, and react in response to an observable facial trait. Areas of interest are as diverse as entertainment, marketing, law enforcement, health, and security. Hence, it is no wonder that automated methods for detection, recognition, and description of facial images have been studied for a long time in computer science.
Face detection techniques aim to identify all image regions which contain a face, regardless of its three-dimensional position, orientation, and lighting conditions used, and if present return their image location and extents. Over the years various robust methods for the detection of (frontal) faces in images and video have been reported, see [YANG02] for a comprehensive and critical survey of face detection methods. When a (frontal) face is detected in an image, face recognition techniques aim to identify the person [ZHAO03]. Early face recognition methods emphasized matching face images by means of subspace methods such as principal component analysis, linear discriminant analysis and elastic graph matching [CHEL10]. For non-frontal faces, 3D morphable model-based approaches have been proposed which consistently outperform subspace methods on controlled datasets. Yet, recognition of faces in unconstrained environments remains a research challenge for the years to come.
Compared to the vast literature on face detection and recognition, research aiming for the description of face images in terms of their visual appearance, e.g., whether the person is from Asian origin, female, wears glasses, smiles, or is a teenager, is modest. Some methods have appeared which aim for categorization of face images in terms of gender [MOGH02, MAKI08, TOEW09], others have focused on age [PARK10], or ethnic origin [GUTT00], while yet another paper considers facial expression [PANT00]. The general approach in these methods is to rescale a detected face to a thumbnail image, describe the thumbnail in terms of visual features, such as pixel intensity, texture or a tuned 3D face model, and to learn the visual trait of interest with the help of labeled examples and machine learning software like support vector machines or AdaBoost. A clear limitation of all these approaches is their lack of generalization. For every visual trait one can think of, a separate visual feature tuned to the trait of interest needs to be crafted carefully.
Some researchers have indeed followed this approach and define a mixture of different visual features as input to a support vector machine, which learns what features to select for assigning specific visual traits to face images. A good example is [KUM08], where the authors break up the face into a number of regions corresponding to hair area, forehead, nose, eyes, etc. Each region is described using a mixture of color, intensity, and edge features which can all be normalized and aggregated, if the support vector machine decides to do so. Naturally this bottom-up approach depends on careful alignment of facial images to prevent that the nose area of the one person is compared with the forehead of another.
The current invention proposes a new process which is able to categorize a human face image according to observable visual traits in a generic fashion. Examples include, but are not limited to: gender, race, age, emotion, facial (hair) properties, abnormalities, and presence of religious, medical, or fashion elements such as hats, scarves, glasses, piercings, and tattoos.