The present invention, in some embodiments thereof, relates to image processing and, more particularly, but not exclusively, to a method and system for processing an image to correct gaze offset.
Videoconferencing systems hold the promise of allowing a natural interpersonal communication at a range. Recent advances in video quality and the adaptation of large high definition screens are contributing to a more impressive user experience.
Along with gestures and facial expression, gaze is one of the most important aspects of person's non-verbal behavior, and many socio-psychological studies have been attested to the importance of the gaze direction and visual contact as communicative signals. In the case of conventional videoconference systems, the arrangement of camera, monitor and user causes an angular error between the gaze direction towards the monitor and the camera optical axis. As a result, when the remote party is looking straight into the image of the local party, the gazes, as perceived by the two parties, are not collinear. For example, when camera is located on top of the screen, the effect is interpreted as looking down.
Several solutions have been proposed to the problem of offset gaze in videoconferencing. One such solution employs a beam splitter between camera and monitor. A partially transparent mirror is fitted in front of the monitor in such a way that the viewer can see the monitor image through the mirror. By means of this mirror, the camera captures the viewer from the direction of the monitor, permitting recording and reproduction over one axis.
Some techniques reproduce the image of the other party by a video projector on a projection screen. The camera can be located behind the projection screen and the party's image can be captured through the screen by means of a window provided with a light valve or a partially transparent projection screen.
Another technique employs view synthesis from multiple cameras. In one such technique, dynamic programming based disparity estimation is used to generate a middle view from two cameras that were positioned on the left and right sides of the screen [A. Criminisi, J. Shotton, A. Blake, and P. H. S. Torr. Gaze manipulation for one-to-one teleconferencing. In ICCV, 2003].
In an additional technique, the images of participants are rendered digitally in a virtual three-dimensional space, and a head-pose orientation and eye-gaze direction are digitally corrected as internal mathematical computations without the need for a display device. The digitally corrected data are transmitted to a display screen so that a particular participant's image in the three-dimensional space appears to other participants viewing the screen as if the particular participant was looking at them screen [see, e.g., Gemmell et al., “Gaze awareness for video-conferencing: A software approach,” IEEE MultiMedia, 2000, and U.S. Pat. No. 6,806,898].
In another technique, a gaze deviation value is determined and used for calculating a corresponding point of an input image corresponding to a particular position in a corrected image. A pixel value at the corresponding point is calculated using the input image from the camera. The gaze corrected image is transmitted by using the calculated pixel value as the pixel value of the particular position of the corrected image [U.S. Pat. No. 6,677,980].
Also known is a region filling technique in which the pupils of the other party in the image plane are segmented and displaced in the image plane, wherein areas which become free as a result of this displacement are filled-in using the color of the eyeballs [U.S. Pat. No. 5,499,303].
Additional background art includes U.S. Pat. Nos. 6,433,759 and 6,771,303.