Realism in computer-generated images requires accurate input models. One way of obtaining high-quality data is through measurements of scene attributes from real photographs using inverse rendering. Inverse rendering is the estimation of reflectance and illumination properties from real photographs in order to synthesize realistic images. Inverse rendering makes it possible to synthesize the images under different lighting and viewing conditions than that of the original photograph.
Inverse rendering is an active research area having with wide applications in both computer vision and computer graphics. One typical application is to generate photo-realistic images of human faces under arbitrary lighting conditions. Despite the complexity and challenging nature of this type of problem, great progress has been made in generating photo-realistic images of objects including human faces and face recognition under different lighting conditions.
One area where inverse rending can be used is the face re-lighting problem. For example, face recognition seeks to recognize faces under a variety of lighting conditions. When comparing faces that were taken under two different lighting conditions face re-lighting must be performed. By way of example, assume that a first face image was taken at a regular uniform lighting conditions and a second face image was taken in lighting where one side of the face is dark and the other side of the face is bright. It is desired to compare the two faces to determine if they are the same person. The first step is to change the lighting conditions of the second image to normalize the lighting conditions so the two images can be compared. This is achieved by re-lighting the second image such that two images can be compared. This allows the face recognition application to recognizes faces under a variety of lighting conditions.
The face re-lighting problem, however, is particularly difficult when there is only a single image of the human face available and it was taken under a harsh or sub-optimal lighting condition. Lighting (or illumination) coefficients for an image are modeled using a spherical harmonic representation. It has been shown that a set of images of a convex Lambertian object obtained under a wide variety of lighting conditions can be approximated by a low-dimensional linear subspace. The problem, however, with this technique is that under harsh lighting conditions the approximation error can be large. Thus, this remains an unsolved problem for both graphics and vision applications such as face relighting and face recognition. Furthermore, this problem becomes even more challenging when there are cast shadows, saturated areas, and partial occlusions.
Some current techniques use a region-based approach. Since lighting in smaller image regions is more homogeneous than larger regions, an image containing a face is divided into smaller regions and a different set of face model parameters is used for each region. In this situation the overall estimation error is smaller than in a single holistic approximation. However, there are two main problems with this region-based approach. First, if the majority of the pixels in a region are problematic (such as the pixels are in cast shadows, saturated, or there are large lighting estimation errors), then the texture (or albedo) information in that region cannot be correctly recovered. The albedo is a material property and is basically the reflectance of the skin. It is sometimes called the reflection coefficient. The albedo is irrespective of the illumination, but is often intertwined with the illumination such that it cannot be easily decoupled. The second problem is that the estimated albedo may be inconsistent across regions.
Another current technique uses a three-dimensional (3D) spherical harmonic basis morphable model (SHBMM) by adding the spherical harmonic illumination representation into a morphable model method. This technique produces photo-realistic rendering results under regular lighting conditions, but obtains poor results in saturated face image areas. Furthermore, because the texture is not separated from the spherical harmonic bases in SHBMM, this technique cannot handle harsh lighting conditions due to the large approximation errors in the spherical harmonic representation.
Another approach uses an image subdivision technique whereby a face is subdivided along feature boundaries (such as eyes, nose, mouth, and so forth) to increase the expressiveness of the morphable models. This approach estimates morphable model parameters independently over each region and performs smoothing along region boundaries to avoid visual discontinuity. However, this approach can not be applied to images under harsh lighting conditions because such images have of the inconsistency of the estimated textures in different regions. Moreover, if most pixels in a region are in cast shadows or saturated areas, there often is not enough information to recover the texture within the region itself.