Due to hardware constraints, teleconferencing videos captured by web cameras typically experience significant imaging noise. Additionally, illumination conditions pose a severe problem. With usual lighting conditions in an office, a single light source on the ceiling may leave significant shadows and highlights on a speaker's face, as shown in the example frame in FIG. 1A. These problems result in unsatisfactory images of the speaker, which have prevented many people from using teleconferencing systems. Given the fact that imaging hardware and lighting conditions are not easy to improve, the post-processing of teleconferencing videos in appropriate ways to improve the quality of faces in image frames becomes a practical means to solve these problems.
Previous approaches to post processing of teleconferencing videos have tried to apply cosmetic effects to the face in a single photograph or image frame. To achieve good results, these earlier systems required manual registration between the target face being enhanced or improved, as well as training examples.
The reproduction of human skin color and texture is an important function of various imaging systems. Some “E-make” functions have been proposed which transform how the skin appears by producing aging or reverse aging effects. Furthermore, hemoglobin and melanin information in the skin has been extracted for designing E-cosmetic functions to realistically change the skin color and texture in the images. These techniques generally require high quality face images as input and only aim at simulating subtle or natural changes in skin color, and thus are not capable of eliminating significant noise and strong shadows presented in teleconferencing videos.
Example-based approaches have been proposed to transfer special effects between faces. In one case, an illumination change of one person's expression is captured by an expression ratio image (ERI), which is then mapped to any other person's face image to generate more expressive facial expressions. An image-based surface detail transfer technique has been used to produce aging effects or as a touchup tool to remove wrinkles and color spots on the face images. However, these techniques require manual registration between a target face image and training images.
Applying cosmetic effects separately on each frame may result in temporally incoherent enhanced video sequences. To avoid this problem, a robust facial component classification and tracking system is essential. Recent advances in face detection and tracking make it possible to robustly track the face in video sequences in real-time. In terms of facial components, generic statistical skin and non-skin color models have been proposed to detect skin regions in images. Neural networks, active wavelet networks, active appearance models (AAM) and a variety of other techniques have been applied for automatic face alignment. However, none of these techniques can generate pixel-level classification results which are necessary to provide robust and realistic looking enhancement of video frames.