This invention relates generally to video telephony, and more particularly to method, apparatus and software for video image processing and display.
A video telephony application typically involves a pair of subjects for which images of their busts are captured and transmitted to each other, along with audio of their voices, in order for both to carry on a conversation as if they were actually face to face. Video telephony applications are many, and include business, academic, and personal communications.
With the advent of pervasive digital communications infrastructure, and less expensive imaging and image processing systems, video telephony is possible in a variety of settings, including office desktops, home telephone sites, phone booths, and other places. However, in most settings for which video telephony is desired, the physical environment or surroundings contain visible features that will appear behind the head of a subject whose likeness is being transmitted. These visible features will often degrade the video telephony experience in a number of ways.
For instance, these features are usually not material to the communication underway, and therefore are visually distracting. Also, they constitute spatial information, and therefore add work to image processing and compression systems involved in transmission or recording. When the subjects head moves, any visible features of the environment behind the subject""s head are obscured and/or revealed, which constitutes temporal and spatial information that add work to image processing and compression systems involved in transmission or recording.
Moreover, video telephone systems must be convenient and pleasing to use if they are to attain their full potential. Video acquisition, however, can be inconvenient and unduly restrictive, in that the videophone user must either physically position his head approximately within a video camera""s field of view, or point the video camera precisely in order to place its field of view around the user""s head. In most of these settings, a subject is required to physically move his head or the camera itself in order to ensure that the transmitted image contains a visually well-framed likeness of himself. However, this manual process is burdensome and prone to error, as people often move both intentionally and unintentionally, resulting in the need to either correct one""s position or the position of the camera. Video telephony is especially prone to this problem, as a subject is likely to pay more attention to the image of the person he is talking to than the visual suitability of the image in the xe2x80x9cself-viewxe2x80x9d feedback display.
Additionally, the quality of a video image in a videophone system is largely a function of the quantity of video data that can be transmitted per unit time from one videophone to another. Therefore, the success and popularity of video telephone technology is largely dependent on the ability to compress, transport and decompress image data quickly and efficiently.
According to one aspect of the invention, there is provide a method of processing video input data originating from a camera in a videophone system. An initial step of the process acquires a frame of the video input data, wherein the frame of input data includes data depicting the head of a user of the videophone system, and data depicting a background setting. The background setting data is eliminated from the frame of video input data to produce a frame of video data without the background setting data. The method further includes the step of transmitting data representing the frame of video data without the background setting data to a remote end of a video telephone link.
According to another aspect of the invention, the background setting data is replaced with monotonous data. Furthermore, the monotonous data may be encoded with a transparency value.
According to yet another aspect, the method further includes the step of receiving the transmitted frame of data at the remote end of the link and displaying the data depicting the user""s head with a replacement image substituted for the eliminated background setting. Furthermore, according to an additional aspect, the replacement image is stored at the remote end of the link. Moreover, according to another aspect, a frame for display at the remote end is formed as a product of the transmitted data and the replacement image for the background setting.
According to still another aspect of the invention, a frame of the video input data is acquired, wherein the frame of input data includes data depicting the head of a user of the videophone system. A viewport framing all or a substantial portion of the user""s head is identified within the acquired frame, wherein the viewport defines a subset of the acquired frame, and the user""s head is centered substantially within the center of the viewport. The method further comprises transmitting data representing all or a portion of the image within the viewport to a remote end of the videophone link.
According to still other aspects of the invention, the acquired frame of video input data includes data depicting a background setting, and the method includes the step of eliminating the background setting data from the data in the viewport region of the acquired frame.
Furthermore, according to another aspect of the invention, the background setting data is replaced with monotonous data prior to the step of transmitting all or a portion of the data within the viewport. On the remote end, a replacement image is substituted for the background setting eliminated from the frame.
These and other aspects of the invention, including implementation in hardware and software, are described in more detail herein below.