Deep learning (DL) is a branch of machine learning and artificial neural network based on a set of algorithms that attempt to model high level abstractions in data by using a deep graph with multiple processing layers. A typical DL architecture can include many layers of neurons and millions of parameters. These parameters can be trained from large amount of data on fast
GPU-equipped computers, guided by novel training techniques that can work with many layers, such as rectified linear units (ReLU), dropout, data augmentation, and stochastic gradient descent (SGD).
Among the existing DL architectures, convolutional neural network (CNN) is one of the most popular DL architectures. Although the idea behind CNN has been known for more than 20 years, the true power of CNN has only been recognized after the recent development of the deep learning theory. To date, CNN has achieved numerous successes in many artificial intelligence and machine learning applications, such as face recognition, image classification, image caption generation, visual question answering, and automatic driving cars.
Face detection is an important process in many face recognition applications. A large number of face detection techniques can easily detect near frontal faces.
In the face recognition as such, the face is recognized by a feature extraction network, which extracts features from an inputted face image, by using the extracted features.
In particular, conventional face recognition devices use input augmentation to improve facial recognition performance.
That is, by referring to FIG. 1, when the face image is inputted, a patch generation 11 may process the face image by using a method such as translation or flip to generate a plurality of patches corresponding to the face image, and a feature extraction network 12 extracts the features from each of the generated patches, and outputs the features corresponding to the face image by averaging the extracted features to perform the face recognition of the face image.
However, these conventional face-recognizing devices take a long time and consume huge computing resources since forward computing must be performed by the feature extraction network as many times as the number of the generated patches.
In addition, with the conventional face recognition devices, it is difficult to ensure a reliability of face recognition results because there is no guarantee that the averaged features are the most optimal features corresponding to the face image.