Facial recognition systems are finding increasing use in a wide variety of applications, from law enforcement, security or access control, to organizing photographs or videos, to online dating services, among myriad other applications. Facial recognition differs from facial detection in that the latter aims to simply detect the presence of a face in an image or video frame, whereas the former aims to recognize a unique individual in an image or video frame from among a potentially large set of identified individuals.
A number of different computational methods have been employed, including nearest-neighbor classifiers, support vector machines, and artificial neural networks (ANNs), among many others. Of the various approaches, convolutional ANNs have demonstrated particularly good performance for this task.
Convolutional ANNs (hereinafter, CNNs) have a trainable architecture that can learn invariant features for a number of applications. In general, CNNs contain alternating convolutional layers, non-linearity layers and feature pooling layers. Each layer is composed of elements, or “neurons,” that have learnable weights and biases. When used for image recognition in particular, CNNs are with multiple layers of small neuron collections process portions of the input image. The outputs of these collections are then tiled so that their input regions overlap, thereby obtaining a better representation of the original image.
In operation, CNNs extract local features of each image at a high resolution and successively combine them into more complex features at lower resolutions. The loss of spatial information is compensated by an increasing number of feature maps in the higher layers.
The convolutional layer computes an inner product of the linear filter and the underlying receptive field followed by a nonlinear activation function at every local portion of the input. Then, the non-linear transformation layer performs normalization among nearby feature maps. The feature-pooling layer combines local neighborhoods using an average or maximum operation, aiming to achieve invariance to small distortions.
An ongoing challenge for designers of facial recognition systems is achieving high accuracy performance on the one hand, and computational efficiency, (e.g., processing speed) on the other. However, these are generally countervailing performance attributes of CNN-based facial recognition systems. A practical solution is needed to further advance both of these performance attributes.