1. Field of the Invention
Example embodiments relate generally to wireless communication, and more particularly to a system and/or a method for reducing latency in two-way video conversations over a wireless (and/or wire-line) network. Jitter caused by latency due to network delay and video encoding/decoding may be reduced by modeling portions of the video image into a low-latency version of the video, and morphing this low-latency version with a conventional (large-latency) video.
2. Related Art
During two-way video conversations, network delays and the time required for video encoding/decoding may result in latency and jitter. Discernible pauses due to significant round-trip delay may also occur, making video conferencing unpleasant or confusing.
Video transmission delay is caused by a combination of: a) pre-coding scene analysis, b) coding time, c) large first-in, first-out (FIFO) buffers (VBV) designed to smooth transmission of variable sizes of compressed frames, and d) decoding time, along with inherent delays caused by camera acquisition and display time. These delays may combine to create delays with a time-duration of a large fraction of a second (up to half a second) in video that is being both transmitted and received on both sides of a video conference. While some of the components of this delay may be engineered to be somewhat smaller, a trade-off exists between factors including image quality, system complexity, processing power and fragility to input signal changes.
Network transmission time is another delay that compounds the video transmission delay. Network transmission time issues may include a combination of transmission latency and jitter. Because video is coded differentially, at a fixed frame rate, each frame must conventionally be received and decoded before starting on a next frame (otherwise errors in the final image may result). For this reason, an additional level of buffering delay is introduced prior to packets reaching a decoder. If the amount of buffering is reduced, an increase in the frequency of discernible errors in video due to jitter may be increased. A conventional approach to reducing network latency and jitter is to use a higher quality of service (QoS) network path (if one exists), which may be offered for instance in a 4G network. However, such high-QoS paths are generally relatively limited and costly in terms of network resources and management configurations.
While an audio stream generally does not suffer from the same effects of high-latency issues that video streams experience, a received audio-video stream may suffer from “lip-synchronization” issues where the image of a person speaking does not precisely match the audio channel.
In recent years, great strides have been made in computer analysis of the human body. For instance, well-known 3-D cameras, or 2-D image-plus-depth cameras may generate detailed models of a subject's face (using over 100 facial “landmarks”) and skeletal body position in less than a frame of time. FIG. 1 shows an example of this conventional technology, where a raw image 100 of a person's face is assigned landmarks 102 (indicated by the labeled numbers 1 through 86). Model information may also be gleaned from the raw video 100 to produce a model of the person's face 104 using the model information in accordance with conventionally methods, as shown in FIG. 2. As shown in FIG. 3, a person's body position may also be modeled 106 by assigning landmarks to the person's skeletal joints using conventional methods.
FIG. 6 shows an example of a conventional method of morphing and texture mapping a two-dimensional object. Specifically, a two-dimensional object 500 may be extracted from an original image, and the image 500 may then be distorted into another shape (i.e., a morphed object 500a) that may fit onto a background image 502. A texture of the morphed object 500a may also be adjusted and/or blended with the background 502 (thus producing a morphed/texture mapped image 500a). A morphed/texture mapped image 500a may also be referred to as a ‘warped’ image.