Recently there has been an increasing demand for three-dimensional (3D) face models. The movie industry relies more and more on computer graphics (CG) to place human actors in situations that are physically not feasible. In some situations, the actor is completely replaced by a corresponding virtual counterpart since the required shots would endanger the actor.
To integrate the actors or their CG representations seamlessly, light and shadows cast from other objects must be matched. Conventional approaches using coarse facial models are not sufficient since the human eye is trained to read faces, so even subtle imperfections are spotted immediately. Also, secondary effects, such as wrinkle formation, are especially hard and tedious to create for an animator or by physical simulation, but these secondary effects are essential for natural face appearance.
Currently, the only practical option is to acquire a model of the face using 3D capture. The acquired models can be either integrated directly into a movie or can be used to control other faces. In addition, the movie industry is not the only industry that demands realistic face models. Computer games have a demand for virtual characters. Also, medical science has an interest in such models.
Conventional approaches to 3D capture may be classified as either depth estimation techniques or normal estimation techniques. The depth variation of mesoscopic skin details, such as pores and wrinkles, is in the micrometer range. Most depth estimation techniques simply cannot achieve that level of detail with current hardware. Laser scanning is capable of recovering depth variations at these scales, but this technology produces insufficient results because of the translucency of skin and/or the time required for the acquisition process. As a workaround, a plaster mold of the face is scanned instead. Each of these depth estimation techniques suffers from various drawbacks, including the cumbersome process of obtaining a plaster mold of the actor's face.
Normal estimation techniques distinguish between diffuse and specular normals that emanate from the surface of an object. The specular normals encode much higher detail than the diffuse normals. The diffuse and specular normals can be estimated based on the light reflected at the surface of the subject. Every normal reflects light from a different direction. Given the direction of the light, the normal may be estimated. Two opposing lines of research exist depending on the direction of the incident light. The first line of research uses a single light source at a known position and known direction. However, to sample the whole space of possible normals, the light has to be moved. Thus, the system is only suited for static scenes. The second line of research places light sources all around the subject. The issue here is to distinguish from which light source the reflected light originates.
Conventional normal estimation techniques rely on polarization to separate the diffuse and specular parts and, thus, suffer from a variety of shortcomings. First, state-of-the-art implementations require up to thirteen frames for one scan. To be able to capture performances, these conventional techniques implement very expensive high-speed cameras. Still, the subject being captured is likely to move slightly during the capture process; thus, sophisticated image registration techniques have to be applied to re-align the captured frames. Furthermore, a short exposure time and the use of polarization significantly increase the amount of illumination required, leading to very high energy consumption and heat issues. Finally, polarization of the light limits conventional approaches to capturing a high-resolution specular normal map from a restricted set of viewpoints.
As the foregoing illustrates, there is a need in the art for an improved technique for capture of high-resolution models, such as high-resolution face models.