Face verification, which is a task of determining whether a pair of face images are from the same person, has been an active research topic in computer vision for decades. It has many important applications, including surveillance, access control, image retrieval, and automatic log-on for personal computer or mobile devices. However, various visual complications deteriorate the performance of face verification. This has been shown particularly by numerous studies on real-world face images from the wild.
Modern face verification methods are mainly divided into two categories: extracting low-level features, and building classification models. Although these existing methods have made great progress in face verification, most of them are less flexible when dealing with complex data distributions. For the methods in the first category, for example, low-level features are handcrafted. Even for features learned from data, the algorithm parameters (such as the depth of random projection tree, or the number of centers in k-means) also need to be specified by users. Similarly, for the methods in the second category, the architectures of neural networks (for example, the number of layers, the number of nodes in each layer, etc.), and the parameters of the models (for example, the number of Gaussians, the number of classifiers, etc.) must also be determined in advance. Since most existing methods require some assumptions to be made about the structures of the data, they cannot work well when the assumptions are not valid. Moreover, due to the existence of the assumptions, it is hard to capture the intrinsic structures of data using these methods.
Most existing face verification methods are suitable for handling verification tasks under the underlying assumption that the training data and the test data are drawn from the same feature space and follow the same distribution. When the distribution changes, these methods may suffer a large performance drop. However, many practical scenarios involve cross-domain data drawn from different facial appearance distributions. It is difficult to recollect the necessary training data and rebuild the models in new scenarios. Moreover, there is usually not enough training data in a specified target domain to train a sufficiently good model for high-accuracy face verification, due to the fact the weak diversity of source data often leads to over-fitting. In such cases, it becomes especially important to exploit more data from multiple source-domains to improve the performance of face verification methods in the target-domain.