The intrinsic properties of 3D faces provide an ideal representation which is immune to variations in face appearance introduced by the imaging process such as viewpoint, lighting and occlusion. These invariant facial properties would be useful in a variety of applications in computer graphics and vision. However, it is challenging to recover the 3D face and scene properties (viewpoint and illumination) from the appearance conveyed by a single 2D image. Specifically, it is impossible to distinguish between texture and illumination effects unless some assumptions are made to constrain them both.
The 3D morphable model (3DMM) encapsulates prior knowledge about human faces that can be used for this purpose, and therefore it is a good tool for 3D face reconstruction. The reconstruction is conducted by a fitting process, by which a 3DMM estimates the 3D shape, texture, pose and illumination from a single image. To achieve an efficient and accurate fitting, much research has been conducted, and can be classified into two categories: 1) Simultaneous Optimisation (SimOpt): All the parameters (shape, texture, pose and illumination) are optimised simultaneously; and 2) Sequential Optimisation (SeqOpt): These parameters are optimised sequentially.
The SimOpt methods use gradient-based methods which are slow and tend to get trapped in local minima. On the other hand, SeqOpt methods can achieve closed-form solutions for some parameters optimisation; therefore, SeqOpt is more efficient. However, the existing SeqOpt methods make strong assumptions detailed below and do not generalise well.
In the SimOpt category, the fitting algorithm minimises the sum of squared differences over all colour channels and all pixels between the input and reconstructed images. A Stochastic Newton Optimisation (SNO) technique is used to optimise a nonconvex cost function. The SNO performance is poor in terms of both efficiency and accuracy because SNO is an iterative gradient-based optimiser which can end up in a local minimum.
The efficiency of optimisation is the driver behind some exemplary methods, where an Inverse Compositional Image Alignment (ICIA) algorithm is introduced for fitting. The fitting is conducted by modifying the cost function so that its Jacobian matrix can be regarded as constant. In this way, the Jacobian matrix is not updated in every iteration, greatly reducing the computational costs. However, ICIA cannot model illumination effects.
The Multi-Feature Fitting (MFF) strategy is known to achieve the best fitting performance among all the SimOpt methods. MFF makes use of many complementary features from the input image, such as edge and specularity highlights, to constrain the fitting process. The use of these features results in a better solution as demonstrated in “Estimating 3D Shape and Texture Using Pixel Intensity, Edges, Specular Highlights, Texture Constraints and a prior,” in Computer Vision and Pattern Recognition, 2005. IEEE, 2005, pp. 986-99. Based on the MFF framework, two works improve the fitting robustness. In exemplary prior art, a resolution-aware 3DMM is proposed to improve the robustness to resolution variations, and a facial symmetry is advocated to improve the illumination fitting. However, all the MFF-based fitting methods are rather slow.
In the SeqOpt category, the ‘linear shape and texture fitting algorithm’ (LiST), was proposed for improving fitting efficiency. The idea is to update the shape and texture parameters by solving linear systems. On the other hand, the illumination and camera parameters are optimised by the gradient-based Levenberg-Marquardt method, exhibiting many local minima. The experiments reported in LiST show that the fitting is faster than the SNO algorithm, but with similar accuracy. However, in this approach it is assumed that the light direction is known before fitting, which is not realistic for automatic analysis. Also, the shape is recovered using an optical flow algorithm, which is relatively slow.
Another SeqOpt method decomposes the fitting process into geometric and photometric parts. The camera model is optimised by the Levenberg-Marquardt method, and shape parameters are estimated by a closed-form solution. In contrast to the previous work, this method recovers 3D shape using only facial feature landmarks, and models illumination using spherical harmonics. The least squares method is used to optimise illumination and albedo. Some prior art work improved the fitting performance by segmenting the 3D face model into different subregions. In addition, a Markov Random Field is used to model the spatial coherence of the face texture. However, the illumination models cannot deal with specular reflectance because only 9 low-frequency spherical harmonics bases are used. In addition, these prior art methods use an affine camera, which cannot model perspective effects.
The most recent SeqOpt work (Aldrian et al, “Inverse Rendering of Faces with a 3D Morphable Model”, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 5, pp. 1080-1093, 2013) also sequentially fits geometric and photometric models using least squares. However, different methods are used to estimate shape and texture in this work. Specifically, a probabilistic approach incorporating model generalisation error is used to recover the 3D shape. The reflectance estimation decouples the diffuse and specular reflection estimation. Two reflectance optimisation methods are proposed: (i) specular invariant model fitting and (ii) unconstrained illumination fitting. For (i), first, the RGB values of the model and input images are projected to a specularity-free space for diffuse light and texture estimations. Then the specularity is estimated in the original RGB colour space. For (ii), the diffuse light, texture and specularity are all estimated in the original RGB space. Both (i) and (ii) can achieve closed-form solutions for texture and illumination parameters. Aldrian achieves the state-of-the-art face reconstruction performance. The face recognition is comparable to MFF, but it is much faster. However, Aldrian also uses an affine camera, which cannot model perspective effects. In addition, in the case of (i) in this work, the colour of lighting is assumed to be known, which limits the model generalisation capacity; (ii) relaxes the lighting assumption of (i) and allows any combinations of ambient and directed light, however, (ii) estimates face texture coefficients considering only diffuse light.
Face recognition is an important application of the 3DMM. The face recognition performance is heavily influenced by the fitting accuracy. Most existing 3DMM methods assume that accurate facial landmarks are known. Z.-H. Feng, et al, “Random cascaded-regression copse for robust facial landmark detection,” Signal Processing Letters on, IEEE, vol. 22, no. 1, 2013, p. 2 is an example of prior art that proposes to use automatically detected landmarks. Here, the automatic landmark detection and 3DMM fitting are combined by a data-driven Markov chain Monte Carlo method. This method is robust to automatic landmarks but it is rather slow.
Therefore, the present invention provides a method of modelling an object that reduces the number of required iterations in order to decrease the time taken to model the object, yet provides a better a better fit than existing SeqOpt methods. Furthermore, the present invention provides a method of modelling an object positioned close to a camera without distortion occurring. Furthermore, the present invention provides a fast and accurate method of performing facial recognition. Further advantages are set out in the detailed description.