The use of video conferencing or telephony, which allows remote parties to both see and hear one another, is becoming increasingly popular. As used herein, “video telephony” refers to communications using both video and audio transmitted over a communications network. Such applications facilitate remote communications by providing a visual image of each conference participant. Accordingly, video conferencing allows parties to communicate audibly and visibly, without requiring lengthy and expensive travel.
In a typical video telephony application, a camera is positioned to obtain an image of each of the participants at each endpoint of the communication. The image of a participant at one endpoint is then provided to a participant at another endpoint, so that each participant is viewing the other during the communication session. The video telecommunications interaction can include two or more endpoints, and each endpoint can include more than one participant.
The image that is transmitted during a video conference is often of inferior quality. A number of factors can contribute to the inferior quality of transmitted images. For example, contrast and color saturation levels may be incorrectly set at the transmitting end. In addition, the amount of data that is used to describe an image is often limited, for example due to transmission bandwidth constraints. Furthermore, these limitations on image quality are often exacerbated by poor lighting conditions.
Video cameras that are capable of supporting backlight compensation are capable of removing peaks in image intensity. Such backlight compensation may operate by equalizing the overall image histogram to remove the peaks in image intensity. Although such techniques can be effective at providing an image having improved quality overall, they do not specifically act to improve the quality of those portions of the image that correspond to the face of an imaged participant. Accordingly, the area of the image corresponding to the face of a participant in a video conference may continue to be of relatively low quality.
In order to allow a camera to provide an image that is centered on the face of a participant, face tracking capabilities are available. In a system that provides automatic face tracking, the camera will be zoomed into a detected face and will attempt to make the face dominate the image, thus reducing the effect of the surrounding environment. Although such systems can be effective at following a participant moving around a scene, image information related to background objects is described using the same image parameters available for those portions of the image comprising the face being tracked. As a result, the portion of the image corresponding to the tracked face can be of lower quality than is desired. In particular, because there is a fixed amount of image detail that can be encoded, and because an equal range of available image parameters is devoted to background information as is devoted to the face of the participant, a portion of the finite image information is consumed describing the relatively unimportant background portions of the image.