Notwithstanding several decades of extensive research, the problem of automatic face recognition in a truly unconstrained setting has not yet been solved. As an identifiable biometric modality, faces provide several advantages over other physiological traits such as fingerprints and iris patterns. They are easily acquired by imaging devices at a distance with minimal cooperation by the individual, and therefore lend themselves well to security and surveillance purposes. Usually they can be captured without any voluntary action by the individual, hence offering perhaps the least intrusive form of recognition compared to other biometric modalities. Being relatively large objects, they can be captured at considerable distances without significant loss. The acquisition devices themselves (such as handheld or mounted cameras), are not only inexpensive, but pervasive in society. The prevalence of these imaging devices and the ubiquity of digital communication technology has led to the availability of massive amounts of facial image data. Moreover, faces are an intuitive form of person identification, being the primary method employed by the human visual system. The creation of an automated face recognition system that can equal, and eventually surpass, human performance at large-scale tests, has unsurprisingly been a fundamental objective of computer vision researchers since its inception in the 1960's.
Despite the success of automatic face recognition techniques in many practical applications, face recognition systems still have difficulties performing consistently well under uncontrolled operating environments. Evaluations of state-of-the-art recognition techniques conducted during the past several years have confirmed that pose variations, external occlusions, and low resolution acquisition are the major problems that plague current systems. This work aims to solve or provide tolerance to these challenging problems by proposing a unified objective that uses our proposed compact facial representation model in order to recover and deal with a variety of unconstrained facial degradations.
Face recognition addresses the problem of identifying or verifying one or more persons of interest in an image by comparing processed query faces with a corpus of target face images stored in an enrolled database. Identification (also known as 1:N-matching) is the task of determining a person's identity by comparing the observation to a database of previously enrolled individuals, whereas Verification (also known as 1:1-matching) is the task of comparing two observations, and determining whether they belong to the same individual. There are numerous applications in which face recognition can be exploited for one or the other of these purposes. Security is a primary application: this includes access control to physical areas and authentication of individuals at banks, border crossings and airports. Digital security is of particular concern and is becoming increasingly prevalent, using face recognition to restrict access to secure digital information on personal computers, workstations, and mobile devices, by replacing traditional forms of authentication such as passwords. Continuous authentication of users enables secure access without interfering with other activities. Law enforcement agencies exploit face recognition for matching photographs against driver's licenses or passports, and authenticating personnel at checkpoints. Surveillance CCTV cameras employed in restricted areas, public events, and private property, capture large amounts of footage which often goes unprocessed due to the low resolution, uncontsrained nature of the faces captured. The faces in this video can be used for lead-generation during criminal investigation, missing children can be identified, and post-event analysis can be expedited. Forensic identification of faces is particularly important during large scale calamities, electoral registration systems take advantage of face recognition to establish identity.
The understanding, modeling, analysis and synthesis of faces is important in several other ways as well. The biological structure of faces is intensely studied for several medical applications. Graphics designers, movie animation studios, and game studios need to model and synthesize myriad facial structures, deformations, and appearances. Gender and ethnicity classification, and age estimation, can substantially reduce matching complexity when using large databases. Facial feature analysis and tracking finds several applications in human-computer interaction and vehicle driver monitoring.
A reliable biometric demonstrates two important (and occasionally competing) attributes: reliability of features and discriminability between individuals. The primary impediment to automatic face recognition is that the human face is not a unique, rigid object its appearance is subject to vast variation under numerous intrinsic and extrinsic factors, which affects both of these attributes. Instrinsic factors are caused by the physical nature of the face and the individual to which it belongs, and are therefore independent of the observer. These include the age of the person, changes in cranial structure, facial hair growth, facial expressions, cosmetics, wearables such as eyeglasses and scarves, etc. Extrinsic factors cause the appearance of the face to alter via the imaging environment; this includes the scene illumination, the relative orientation and positioning of the imaging device to the face, physical obstructions between the face and the sensor, and the imaging parameters, such as lens zoom, focus, sensor resolution, sensor noise, etc.
The design of an automated face recognition system must necessarily include the ability to identify, model, and account for these intrinsic and extrinsic variations. However, this is not always possible in the general sense; each variation often introduces irrecoverable information loss; some of them are hard to accurately detect and measure, and several of these variations are often observed simultaneously. The human visual system often compensates for such situations by using external inputs, such as holistic scene understanding, contextual information, spatial organization, and con-sequential reasoning. Automated systems do not have these benefits, principally due to physical and technological restrictions. However, they can outperform humans in other ways, such as efficiency in large-scale recognition problems, continuous/long-term monitoring, etc. Semi-automated “human-in-the-loop” systems can thus incorporate the best features of these setups, outperforming both humans and fully-automated systems. The proposed system in this application is amenable to both fully- and semi-automated configurations.
To benefit from the advantages that face recognition offers, a system should be designed to analyze an uncooperative face in uncontrolled environment and an arbitrary situation, overcoming both intrinsic and extrinsic variations. We particularly focus on the challenging, prevalent problems introduced by 3D pose variations, extraneous and self-occlusions, low-resolution acquisition, and combinations of these degradations. Each of these degradations has spurred entire communities of research, we provide a high-level overview in this section outlining a few landmark works. For a more thorough review, the reader is directed towards recent survey papers for face recognition in general, and for specific degradations.
Pose-Invariant Face Recognition
Faces are inherently three-dimensional objects, but are often captured in two-dimensional projections via images or video, leading to severe pose-induced challenges towards recognition, e.g. off-angle viewpoints engender self-occlusions of facial textures. This is particularly problematic in uncooperative applications such as surveillance. The adverse effects of non-frontal poses in face recognition were quantitatively and qualitatively assessed over several evaluations. In a recent comprehensive review of these techniques, it was concluded that no existing techniques at the time were free from limitations or able to fully solve the pose problem in face recognition.
Recognizing Low-Resolution Faces
Low resolution is a major impediment for face recognition systems, primarily caused by the distance of the sensor to the subject, and sensor size and cost limitations. A landmark paper in the field, outlined the detrimental impact of resolution on different components of face recognition. In a recent review of the subject, the performance drop of face recognition systems when provided with low-resolution images was attributed to four distinct factors: misalignment issues, acquisition noise, feature loss and dimensionality mismatch (in certain cases which cannot be addressed by simple resizing).
There are two schools of thought regarding techniques to overcome the problems of low resolution: construction of resolution-tolerant recognition techniques, and image-based super-resolution algorithms. Among these approaches, the former is generally considered to be a harder problem to solve. Face recognition performance degrades gracefully with lowering resolution but exhibits a dramatic performance drop at approximately 12 pixels of inter-pupillary distance (IPD). This has been re-proved many times, most recently in 2011. Resolution-robust feature representations for both global and local features have shown certain improvements over the years, although they are still restricted by the irretrievable information loss endured by long-range acquisition.
Occlusion Tolerant Face Recognition
Occlusions are a consistent cause of hardship in real-world face recognition systems. Sources of occlusion include apparel such as eyeglasses, sunglasses, hats, or scarves, objects such as cell phones placed in front of the face, facial or head hair. Moreover, even in the absence of an occluding object, violations of an assumed model for face appearance may act like occlusions: e.g., shadows due to extreme illumination.
Compared to the other degradations, research in the field of face recognition under facial occlusions has been rewarding. Simple techniques using pixel correlation and in-painting have achieved moderate prominence in visual reconstruction, but are ineffectual for large-scale recognition tasks. Morphable models have been adapted to handle occlusions in a global framework with moderate success. Sparse coding techniques have recently demonstrated impressive results when compared to other techniques in dealing with recovery of occluded data. Other families of techniques have also been successful by incorporating occlusion masks, which are either manually provided or automatically detected.
Handling Simultaneous Degradations
Very few works have tried to address the problems introduced by simultaneous sources of acquisition degradation to faces, despite the construction of several databases which capture the appearance variations concurrently. No known previous work has explicitly attempted face recognition under the simultaneous conditions of 3D pose, occlusions, low resolution, and other real-world degradations.