Introduced for the first time by Blanz & Vetter in 1999 (“A morphable model for the synthesis of 3d faces” by Volker Blanz and Thomas Vetter, in Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH '99, pages 187-194, New York, N.Y., USA, 1999, ACM Press/Addison-Wesley Publishing Co.), morphable models have since not ceased to increase in popularity.
These morphable models are used both in three-dimensional or 3D animation (“Mpeg-4 compatible 3D facial animation based on morphable model” by Bao-Cai Yin, Cheng-Zhang Wang, Qin Shi, and Yan-Feng Sun, in Machine Learning and Cybernetics, 2005, Proceedings of 2005 International Conference on, volume 8, pages 4936-4941 Vol. 8, August 2005; and “Statistical generation of 3D facial animation models” by Rudomin, A. Bojorquez, and H. Cuevas, in Shape Modeling International, 2002, Proceedings, pages 219-226, 2002.) and for the purposes of identity verification or recognition (“Face recognition based on fitting a 3D morphable model” by Volker Blanz and Thomas Vetter, Pattern Analysis and Machine Intelligence, IEEE Transactions on, 25(9):1063-1074, 2003; “Automatic 3D face verification from range data” by Gang Pan, Zhaohui Wu, and Yunhe Pan in Acoustics, Speech, and Signal Processing, 2003, Proceedings. (ICASSP '03). 2003 IEEE International Conference on, volume 3, pages III-193-6 vol. 3, April 2003; “Audio- and Video-Based Biometric Person Authentication” by Alexander M. Bronstein, Michael M. Bronstein, and Ron Kimmel, 4th International Conference, AVBPA 2003 Guildford, UK, Jun. 9-11, 2003 Proceedings, chapter Expression-Invariant 3D Face Recognition, pages 62-70, Springer Berlin Heidelberg, Berlin, Heidelberg 2003; “3d shape-based face recognition using automatically registered facial surfaces” by M. O. Irfanoglu, B. Gokberk, and L. Akarun, in Pattern Recognition, 2004, ICPR 2004, Proceedings of the 17th International Conference on, volume 4, pages 183-186 Vol. 4, August 2004.
Initially applied to the modeling of faces, these models have gradually been transposed to many other elements such as ears (“A novel 3D ear reconstruction method using a single image” by Chen Li, Zhichun Mu, Feng Zhang, and Shuai Wang, in Intelligent Control and Automation (WCICA), 2012 10th World Congress on, pages 4891-4896, IEEE, 2012; “3D morphable model construction for robust ear and face recognition” by John D Bustard and Mark S Nixon, in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2582-2589. IEEE, 2010.), the human body in its entirety (“The space of human body shapes: reconstruction and parameterization from range scans” by Brett Allen, Brian Curless, and Zoran Popovic, in ACM transactions on graphics (TOG), volume 22, pages 587-594. ACM, 2003.) or even to animal skeletons (“Morphable model of quadrupeds skeletons for animating 3D animals” by Lionel Reveret, Laurent Favreau, Christine Depraz, and Marie-Paule Cani, in Proceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA '05, pages 135-142, New York, N.Y., USA, 2005. ACM.).
Nevertheless, whatever the studied subjects, the steps of construction remain substantially identical, namely:
1) Acquisition of 3D data serving as statistical training examples.
2) Dense registration of said training examples.
3) Creation of a vector space specific to the studied subject using a statistical analysis method such as principal component analysis (PCA), independent component analysis (ICA) or derivatives thereof.
The last step of this process, step 3), in particular gives rise to what is called an average vector and to deformation modes the linear combinations of which subsequently allow not only the training examples to be reformed but also new elements (new faces in the case of a morphable model of faces for example) to be generated.
However, despite the apparent simplicity of such a method, its application must solve two major problems: that of determining which points can be registered in each training example and that of carrying out this association on a sufficient number of points (conventionally several thousand).
To this end, Blanz and Vetter have proposed to use an optical flow algorithm (“Hierarchical motion-based frame rate conversion” by James R Bergen and R Hingorani, Technical report, David Sarno Research Center, 1990).
At this stage, it will be noted that the laser used for the scanning-acquisitions delivered a cylindrical representation (also called a 2.5D representation). Thus, a two-dimensional or 2D image of the texture was immediately available and capitalized upon to implement the aforementioned algorithm.
However, in addition to being very sensitive to its initialization, this algorithm requires the deformations from one example to the next to be small (in the manner of successive images of a video), there being no reason for this to be so in the general case. Moreover, cylindrical representations have the major drawback of generating occlusions. Although the latter are relatively rare in the case of faces, making the method of Blanz and Vetter usable, the same does not apply in the case of more complex shapes, such as those of ears, for which the loss of information may prove to be unacceptable.
Chen Li et al. for their part took advantage of the particular shape of the subject studied thereby, namely the ear, and of position data, namely a photo and a depth map of the ear seen in profile, to construct a triangle mesh hierarchical growth algorithm (“A novel 3D ear reconstruction method using a single image” by Chen Li, Zhichun Mu, Feng Zhang, and Shuai Wang, in Intelligent Control and Automation (WCICA), 2012 10th World Congress on, pages 4891-4896, IEEE, 2012). A depth map, also called a 2.5D image or “z map”, is a pixel-based image of z-coordinates that is in general created using a 3D camera. The grayscale levels in the depth map represent height values.
Contour detection was carried out on the photo and two initial markers were placed by the operator. The intersection of the perpendicular bisector of the segment connecting these two points with the exterior contour of the ear created a third point. By iterating this method with the new point and the preceding points, the authors created 17 points that were descriptive of the exterior contour of the ear. Via an analogous process, they also created other series of points that were descriptive of interior contours.
Next, a series of triangulations allowed them to obtain a deterministic segmentation of the ear into 23552 triangles and 13601 points. Assuming the still camera used to take the photo and 3D camera used to produce the depth map were positioned in the same location, 3D coordinates could be associated with the segmentation performed, thus achieving the registration.
Nevertheless, the very nature of the data makes the convolutions of the ear inaccessible and, in the end, does not allow a simplified model to be obtained, thus limiting the potential range of applications of this approach.
Furthermore, since the matching method is based on global and not local geometric considerations, such as the intersection of a straight line starting from one end of the image with a curve present at the other end, it causes dilution or even complete loss of the semantic information conveyed by the image.
Thus, characteristic points of the ear, such as the tragus or anti-tragus, cannot be reliably associated with one or more of the constructed descriptive points.
Lastly, such as mentioned by its authors, this method has the major drawback of giving correct results only for convex shapes, chaotic results in contrast being returned for simple star- or crescent-shaped geometries for example.
Kaneko et al. (“Ear shape modeling for 3D audio and acoustic virtual reality: The shapebased average hrtf” by Shoken Kaneko, Tsukasa Suenaga, Mai Fujiwara, Kazuya Kumehara, Futoshi Shirakihara, and SaSatoshi Sekine, Audio Engineering Society Conference: 61st International Conference: Audio for Games, Audio Engineering Society, 2016.) for their part used x-ray scans of molds of ears of volunteers and privileged the use of non-rigid 3D registration methods (“A new point matching algorithm for non-rigid registration” by Haili Chui and Anand Rangarajan, Computer Vision and Image Understanding, 89(2):114-141, 2003; “Robust point set registration using gaussian mixture models” by Bing Jian and Baba C Vemuri Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(8):1633-1645, 2011.). The meshes consisted of about 3000 peaks and the deformation vectors transforming a reference mesh into the others of the database were sought using mixtures of Gaussians.