Computer vision techniques are used for recognizing human attributes, e.g., gender, age, race, hair style, and clothing style. These techniques have many applications, including facial recognition/verification, visual search, and automatic tagging of people.
Computer vision techniques sometimes employ attributes as an intermediate representation for knowledge transfer on object recognition tasks. Representing the image as a list of human selected attributes can help to recognize previously unseen objects with few or zero examples. Furthermore, the relative strength of attributes based on a ranking function for each attribute, can be applied to rich textual descriptions associated with the images. Vocabularies can be discovered automatically, e.g., by mining unlabeled text and image data sampled from the web. As a particular example, attribute recognition can be combined with an interactive crowdsourcing technique to discover both localized and discriminative attributes to differentiate people in photographs. Facial attributes and expressions can be learned for face verification and image search tasks. However, traditional techniques rely on just frontal face subjects in the images to recognize facial attributes.
Attribute recognition can be a very challenging task when dealing with non-frontal facing images, low image quality, occlusion (e.g., hidden features), and pose variations. The signals associated with some attributes can be subtle and the images can be dominated by the effects of poses and viewpoints. For example, considering the problem of detecting from an image whether a person wears glasses, the signal of glasses wireframe is weak as compared to the scale of the full person and the appearance can vary significantly depending on the head pose, frame design and occlusion by the subject's hair. Therefore, the underlying attributes can be hard to predict from the image due to the relative weak signal and pose variances.