1. Field of the Invention
The present invention relates generally to video communication, and more particularly to video error concealment.
2. Description of Related Art
Video images have become an increasingly important part of global communication. In particular, video conferencing and video telephony have a wide range of applications such as desktop and room-based conferencing, video over the Internet and over telephone lines, surveillance and monitoring, telemedicine, and computer-based training and education. In each of these applications, video and accompanying audio information is transmitted across telecommunication links, including telephone lines, ISDN, DSL, and radio frequencies.
A standard video format used in video conferencing is Common Intermediate Format (CIF), which is part of the International Telecommunications Union (ITU) H.261 videoconferencing standard. Additional formats with resolutions higher and lower than CIF have also been established. FIG. 1 is a table of the resolution and bit rate requirements for various video formats under an assumption that 12 bits are, on average, required to represent one pixel. The bit rates (in megabits per second, Mbps) shown are for uncompressed color video frames.
Presently, efficient transmission and reception of video signals may require encoding and compression of video and accompanying audio data. Video compression coding is a method of encoding digital video data such that less memory is required to store the video data and a required transmission bandwidth is reduced. Certain compression/decompression (CODEC) schemes are frequently used to compress video frames to reduce required transmission bit rates. Thus, CODEC hardware and software allow digital video data to be compressed into a more compact binary format than required by the original (i.e., uncompressed) digital video format.
Several conventional approaches and standards to encoding and compressing source video signals exist. Some standards are designed for a particular application such as JPEG (Joint Photographic Experts Group) for still images and H.261, H.263, MPEG (Moving Pictures Experts Group), MPEG-2, and MPEG-4 for moving images. For moving images, these coding standards, typically, use block-based motion-compensated prediction on 16×16 pixels, commonly referred to as macroblocks. In one embodiment, a macroblock is a unit of information containing four 8×8 blocks of luminance data and two corresponding 8×8 blocks of chrominance data in accordance with a 4:2:0 sampling structure, where the chrominance data is subsampled 2:1 in both vertical and horizontal directions.
For applications in which audio accompanies video, as a practicality, audio data also must be compressed, transmitted, and synchronized along with the video data. Synchronization, multiplexing, and protocol issues are covered by standards such as H.320 (ISDN-based video conferencing), H.324 (POTS-based video telephony), and H.323 (LAN or IP-based video conferencing). H.263 (or its predecessor, H.261) provides the video coding part of these standards groups.
A motion estimation and compensation scheme is one conventional method typically used for reducing transmission bandwidth requirements for a video signal. Because the macroblock is the basic data unit, the motion estimation and compensation scheme may compare a given macroblock in a current video frame with the given macroblock's surrounding area in previously transmitted reference video frames, and attempt to find a close data match. If a close data match is found, the scheme subtracts the given macroblock in the current video frame from a closely matched, offset macroblock in a previously transmitted reference video frame so that only a difference (i.e., residual) and the spatial offset needs to be encoded and transmitted. The spatial offset is commonly referred to as a motion vector. If the motion estimation and compensation process is efficient, the remaining residual macroblock should contain a small amount of information thereby leading to efficient compression.
Video data may be transmitted over packet switched communication networks or on heterogeneous communications networks in which one of the endpoints is associated with a circuit-switched network, and a gateway or other packet-switched to circuit switched network bridging device is used. When preparing video frame information for transmission over a packet switched communication network, encoding schemes transform the video frame information, compressed by motion estimation and compensation techniques or other compression schemes into data packets for transmission across the communication network. Data packets are sometimes lost, duplicated, or delayed which can introduce errors resulting in video quality degradation.
For example, if one or more data packets of a previously transmitted reference frame are lost upon transmission from a source encoding unit to a target decoding unit, then a mismatch between encoder and decoder reference frames typically results. When the encoder and decoder reference frames are not mismatched, a residual computed and transmitted by the encoder is decoded and added to a motion compensated video frame derived from the decoder's reference frame. Roughly speaking, in the absence of transmission errors, the resulting decoded video frame exactly matches the encoder's reference frame. When a reference frame mismatch occurs, the sum of the decoded residual and the decoder's motion compensated video frame results in a decoded video frame that further differs from the encoder's reference frame. Without correction, these differences, called prediction drift, increase until the decoded video becomes unintelligible even if subsequent encoded video is received error free.
Therefore, there is a need for a system and a method to conceal errors caused by data packet loss and reference frame mismatches, thereby improving video quality.