1. Field of the Invention
The present invention relates generally to the field of digital signal processing, particularly video processing, and even more particularly to video coding/decoding for transmission over a data communications/telecommunications network. More specifically, the invention relates to methods and systems for increasing the robustness of a video transmission over lossy communications channels.
2. Description of the Related Art
The amount of information involved in a video sequence is so large that compression is required to efficiently transmit the video sequence over a data communications/telecommunications network.
Video compression is accomplished by properly coding the captured video sequence.
Various standards or specifications for video processing have been developed over the years to standardize and facilitate various coding schemes relating to multimedia signal processing.
In particular, the Moving Pictures Expert Group (MPEG) developed a standard, known as the ISO/IEC 14496-2 (Part 2—Visual) “Coding of audio-visual objects”, shortly referred to as the MPEG-4 standard, and ISO/IEC 14496-10 (Part 10, Advanced Video Coding), which standardize various coding schemes for visual objects or video signals (the acronym ISO stands for International Organization for Standardization, whereas IEC stands for International Electrotechnical Commission, respectively). Generally, the MPEG specification standardizes the type of information that an encoder needs to produce and write to an MPEG-compliant bit-stream, as well as the way in which a decoder needs to parse, decompress and re-synthesize this information to re-obtain the encoded signals.
Other coding standards include, for example, the so called H.26x (promulgated by the ITU-T Video Coding Experts Group—VCEG; the acronym ITU-T stands for International Telecommunications Union, Standardization Sector).
The MPEG4 and the H.26x standards belong to the class of the so-called “predictive” video coding schemes. Generally, in a predictive video coding scheme the difference between the value (e.g., the luminance) of a generic pixel of the current video frame and a predicted value of that pixel is coded and transmitted to the receiver; the encoded difference is decoded at the receiver side, and the value obtained is added to the predicted value of the pixel, so as to obtain a reconstructed pixel value. The prediction is based on previously transmitted and decoded spatial and/or temporal information; for example, the predictors can include pixels from the present frame (“intra” prediction) as well as pixels from previously decoded frames in the video sequence (“inter” prediction); the inter prediction is motion-compensated, taking into account frame-to-frame displacement of moving objects in the sequence.
Predictive coders/decoders (“codecs”) are intrinsically very susceptible to prediction mismatch between the encoder, where the source video data are encoded and then transmitted, and the decoder, where the encoded video data are received and decoded to reconstruct the original video data. The encoded video data are transmitted in packets; during the transmission, packets may get lost, due for example to the fact that the transmission channel is noisy (i.e., it is a “lossy” channel). If this occurs, a locally decoded copy of a reconstructed video frame at the encoder may not match the corresponding reconstructed video frame at the decoder. This effect is known as “drift”, and leads to a significant reduction in the quality of the decoded video data. Drift occurs because, due to the noise of the transmission channel, the encoder and the decoder loose synchronization, as they work on different copies of the reconstructed video frame.
In J. Wang et al., “Robust video transmission over a lossy network using a distributed source coded auxiliary channel”, Picture Coding Symposium, San Francisco (Calif.), December 2004, and US-A-2005/0268200, a method is disclosed to improve robustness of predictive video codecs, which is inspired to the principles of Distributed Source Coding (DSC). Errors in data reconstruction are reduced, and the drift effect mitigated, by sending extra information over a lower-rate auxiliary channel (or secondary channel).
The Applicant has observed that the method disclosed in the cited references uses a modification of an algorithm known in the art as the ROPE (Recursive Optimal Per-pixel Estimate) algorithm; such an algorithm is for example described in Zhang et al., “Optimal intra/inter mode switching for robust video communication over the Internet”, Proc. 33rd Ann. Asilomar Conf. on Sig. Syst. Comp., 1999. The encoder estimates, on a pixel basis, the expected distortion of the decoded video sequence due to channel loss. The algorithm requires in input an estimate of the packet loss rate and the knowledge of the error concealment technique used at the decoder, with no need to perform any comparison between the original and the decoded video frames. The algorithm is applied directly in the DCT (Discrete Cosine Transform) domain: the recursive algorithm keeps track of the variance of each DCT coefficient, treated as a random variable, which can be seen as an estimate of the drift observed at the decoder.