Video conference sessions, such as sessions within immersive Telepresence (TP) environments, are carefully designed to provide maximum and clear eye contact between local and remote participants. In certain TP systems, a lighting fixture is provided behind one or more display screens within a video conferencing room that ensures there is sufficient ambient lighting for participants in the front row of the conference room. While the lighting fixture provides a suitable level of lighting for certain scenarios, it can also be desirable at times to remove the lighting fixture from the room while maintaining the same level of perceptual quality and eye contact for a video conference session. This may be achieved by automatic scene relighting, where the term “relighting” stands for transformation of pixels in images through digital signal processing technique.
Automatic scene relighting for improving image quality can be accomplished utilizing certain known techniques. However, challenges remain for applying scene relighting on video in real time. For example, one example technique generates a skin color model and a global exposure correction is then applied using this model to detected skin tone areas within each video frame of a video conference session. However, this technique can also result in applying exposure correction to non-skin tone areas within frames (since the model is based solely upon a skin color model).
A better approach utilizes photometric mapping that is learned offline (e.g., utilizing snapshots of the video frames at different exposures), which transforms a low exposure image toward a high dynamic range (HDR) tone-mapped image. The learned photometric mapping is applied thereafter on every video frame to transform each video image into a new image with higher perceptual quality. However, this approach implicitly assumes that lighting in a scene is fixed and object motion does not cause photometric variation. This assumption does not work well in a dynamic environment such as a video conference session in a TP room, where the actual lighting in the room can vary over time and object motion can cause photometric variations. As a result, a photometric mapping learned from calibration of still images may not result in good image quality with changing scenes.
In addition, the use of only an HDR based correction for lighting, either by photometric mapping or using other techniques, may be insufficient to present high perceptual quality and maximum eye contact within TP video conferencing sessions, due to the directional lighting associated with a scene. For example, overhead lighting, typical in a conference room, can create shadows under the eyes, nose, and cheeks of participants within the room which would remain in the HDR processed image and degrade the immersive experience that can otherwise be provided today in TP rooms that utilize a lighting fixture.