Many face verification methods represent faces by high-dimensional over-complete face descriptors like LBP or SIFT, followed by shallow face verification models.
Some previous studies have further learned identity related features based on low-level features. In these processes, attribute and simile classifiers are trained to detect facial attributes and measure face similarities to a set of reference people, or distinguish the faces from two different people. Features are outputs of the learned classifiers. However, they used SVM (Support Vector Machine) classifiers, which are shallow structures, and their learned features are still relatively low-level.
A few deep models have been used for face verification Chopra et al. used a Siamese architecture, which extracts features separately from two compared inputs with two identical sub-networks, taking the distance between the outputs of the two sub-networks as dissimilarity. Their feature extraction and recognition are jointly learned with the face verification target.
Although in the prior art, some of solutions used multiple deep ConvNets to learn high-level face similarity features and trained classifiers for face verification, their features are jointly extracted from a pair of faces instead of from a single face. Though highly discriminative, the face similarity features are too short and some useful information may have been lost before the final verification.
Some previous studies have also used the last hidden layer features of ConvNets for other tasks. Krizhevsky et al. illustrated that the last hidden layer of ConvNets, when learned with the target of image classification, approximates Euclidean distances in the semantic space, but with no quantitative results to show how well these features are for image retrieval. Farabet et al. concatenated the last hidden layer features extracted from scale-invariant ConvNets with multiple scales of inputs for scene labeling. Previous methods have not tackled the face verification problem. Also, it is unclear how to learn features that are sufficiently discriminative for the fine-grained classes of face identities.