1. Field of the Invention
The present invention relates to coding techniques, for instance for images and video signals.
However, reference to images and video signals must not be construed in a limiting sense of the scope of the invention. The invention applies in an undifferentiated manner to any kind of digital signals, irrespective of their nature (audio, video, data).
2. Description of the Related Art
The goal of Multiple Description Coding (MDC), as described, e.g., in V. K. Goyal “Multiple Description Coding: Compression Meets the Network” IEEE Signal Proc. Mag. September 2001 pp. 74-93, is to create several independent bit-streams using an existing video codec (i.e., coder-decoder). Bit-streams can be decoded independently or jointly. The larger the number of the bit-streams decoded, the larger the quality of the output video signal.
Multiple Description Coding employs a pre-processing stage upstream of the encoder, in order to split the video sequence and control redundancy among subsequences. It also employs a post-processing stage downstream of the decoder, in order to merge the received and successfully decoded sub-streams.
Multiple Description Coding greatly improves error resiliency, because each bit-stream can be decoded independently. Also, variable bandwidth/throughput requirements can be managed by transmitting a suitable number of descriptions. However, coding efficiency is somewhat reduced depending on the amount of redundancy left among subsequences.
Multiple Description Coding is essentially analogous to Scalable Coding (also known as Layered Coding). The difference lies in the dependency among bit-streams. The simplest case is when two bit-streams are created. In the case of scalable coding, they are referred to as “base layer” and “enhancement layer”, respectively. The latter layer depends on the former layer and cannot be decoded independently therefrom. On the other hand, in the case of Multiple Description Coding, each description can be individually decoded to get a base quality video. As for Scalable Coding, there can be spatial, temporal or SNR (Signal-to-Noise Ratio) Multiple Descriptions (MD).
Replicated headers/syntax and replicated motion vectors among bit-streams greatly impede coding efficiency in SNR MD. Replicated headers/syntax also hinder temporal MD, and motion compensation is less effective because of the increased temporal distance between frames. Spatial MD is similarly hindered by headers/syntax. However, contrary to temporal MD, motion compensation is not affected, particularly when 8×8 blocks are split into smaller blocks, as in the latest H.264 codec (coder/decoder). Because of this, spatial MD Coding is usually regarded as the best choice for video coding.
The underlying video codec can be either one of the traditional solutions based on DCT (Discrete Cosine Transform) transform and motion compensation (e.g., MPEG-x, H.26x), or one of the more recent codec based on the wavelet 3D transform (e.g., SPHIT). The H.264 codec is particularly promising because of its increased coding efficiency, which helps in compensating for the losses due to replicated headers/syntax overhead.
Additionally, multimode prediction (up to four motion vectors per 8×8 block) is expected to assist with Spatial MD. Several schemes exists: overlapping quantization (MDSQ or MDVQ), correlated predictors, overlapped orthogonal transforms, correlating linear transforms (MDTC, e.g., PCT or pair-wise correlating transform for 2 MD), correlating filter banks, interleaved spatial-temporal sampling (e.g., video redundancy coding in H.263/H.263+), spatial-temporal polyphase down-sampling (PDMD), domain based partitioning (in the signal domain or in a transform domain), FEC based MDC (e.g., using Reed-Solomon codes).
A simple scheme for Signal-to-Noise Ratio MD is coding of independent video flows created by means of MD quantizers, either scalar or vector (MDSQ, MDVQ). The structure of the MD quantizer controls redundancy.
A simple scheme for Spatial/Temporal MD is coding of independent video flows created by means of Spatial or Temporal Polyphase Down-sampling (PDMD). A programmable Spatial or Temporal low-pass filter controls redundancy.
As an example, Temporal MD can be achieved by separating odd and even frames, creating two subsequences. Alternatively, odd and even fields can be separated. Spatial MD is achieved by separating pixels of 2×1 blocks, so that 2 subsequences are created. Alternatively 4 sub-sequences can be created by separating pixels in 2×2 block. The two techniques can be combined. Unlike temporal MD, spatial MD requires careful processing to avoid color artifacts caused by down-sampled chroma formats and field interlacing. Each subsequence is then fed into a standard video encoder.
A technique known as “Multiple Description Coding by means of FEC” (MD by FEC) is disclosed in R. Puri, K. W. Lee, K. Ramchandran and V. Bharghavan, “Forward Error Correction (FEC) Codes Based Multiple Description Coding for Internet Video Streaming and Multicast”, Signal Processing: Image Communication, Vol. 16, No.˜8, pp˜745-762, May 2001, as well as R. Puri and K. Ramchandran, “Multiple Description Source Coding Through Forward Error Correction Codes”, Proceedings of the 33rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, Calif., October 1999, and R. Puri, K. W. Lee, K. Ramchandran and V. Bharghavan, “Application of FEC based Multiple Description Coding to Internet Video Streaming and Multicast”, Proceedings of the Packet Video 2000 Workshop, Forte Village Resort, Sardinia, Italy, May 2000.
This scheme provides an effective way to build Multiple Descriptions (descriptions that are independently decodable) from “layered” bitstreams, that is layers that are dependent and prioritized: from base to enhancements by using Forward Error Correction codes such as Reed-Solomon. This technique can be referred to briefly as “LC2MD by FEC”, i.e., Layered Coding to Multiple Description by Forward Error Correction.
The LC2MD by FEC scheme is not very flexible and suffers from a structural inefficiency.
In order to generate N descriptions, the LC2MD by FEC scheme needs an encoder able to generate N layers.
Alternatively, the encoder should be able to generate a “progressive” bit-stream (in the sense that it can be truncated to any point), which can then be split into N parts.
Unfortunately, not all state-of-the-art encoders are progressive or able to perform layered coding.
The n-th layer (from 1, the base, to N, the last enhancement) is split into n data packets, and n-N parity packets are added so that any n out of N total packets will help the decoder to reconstruct the n data packets and to decode the n-th layer.
Each packet is sent over a different description. In this way, if n descriptions are received, the layers from first up to n-th will be decoded. The higher the number of descriptions received, the higher the decoded quality as happens for MD coding.
Three layers (A=base, B=enhancement1, C=enhancement2) are needed to generate three descriptions. The first layer (A) can be simply copied into all the descriptions. The second layer (B) is split into two parts (B1, B2) that are sent in the descriptions 1 and 2; the last description will contain the result of the logic XOR operation of B1 and B2, B*=B1 xor B2. The third layer (C) is split into three parts (C1, C2 and C3), which are sent in descriptions 1, 2 and 3.
It is clear that, if only one description is received, only the first layer can be decoded. If two descriptions are received, also the second layer can be decoded. Finally, if all three descriptions are received, all layers can be decoded.
Such an arrangement exhibits a marked structural inefficiency.
Firstly, there is an overhead; the overhead is minimum if the layer n is smaller in size with respect to layers m<n, the overhead is maximum if layers have comparable sizes. Moreover, layered coding is inefficient with respect to standard single layer coding and therefore will add its own overhead.
Layers can be obtained by data partitioning. Data partitioning does not add overhead by itself. Unfortunately it generates layers that have comparable size.
For this reason the “LC2MD by FEC” scheme will introduce a huge overhead. Spatial scalability can be used. This adds overhead but layers will be larger and larger, hence the overhead introduced by LC2MD by FEC will be minimized (but always present).
Additionally not all received bits can be used. Therefore there is a waste of successfully received data.
If only one description is received, the layer A is decoded. But the half part of layer B (B1, B2 or B3) and the third part of layer C (C1, C2 or C3) are wasted. If two descriptions are received, the layer A and B are decoded, but the two parts over three of layer C successfully received are wasted.
The topics considered in the foregoing are covered by extensive technical literature, as witnessed, e.g., by:    P. C. Cosman, R. M. Gray, M. Vetterli, “Vector Quantization of Image Subbands: a Survey”, September 1995;    Robert Swann, “MPEG-2 Video Coding over Noisy Channels”, Signal Processing and Communication Lab, University of Cambridge, March 1998;    Robert M. Gray “Quantization”, IEEE Transactions on Information Theory, vol. 44, n.6, October 1998;    Vivek K. Goyal, “Beyond Traditional Transform Coding”, University of California, Berkeley, Fall 1998;    Jelena Kovacevic, Vivek K. Goyal, “Multiple Descriptions—Source-Channel Coding Methods for Communications”, Bell Labs, Innovation for Lucent Technologies, 1998;    Jelena Kovacevic, Vivek K. Goyal, Ramon Arean, Martin Vetterli, “Multiple Description Transform Coding of Images”, Proceedings of IEEE Conf. on Image Proc., Chicago, October 1998;    Sergio Daniel Servefto, “Compression and Reliable Transmission of Digital Image and Video Signals”, University of Illinois at Urbana-Champaign, 1999;    Benjamin W. Wah, Xiao Su, Dong Lin, “A survey of error-concealment schemes for real-time audio and video transmission over internet”. Proceedings of IEEE International Symposium on Multimedia Software Engineering, December 2000;    John Apostolopoulos, Susie Wee, “Unbalanced Multiple Description Video Communication using Path Diversity”, IEEE International Conference on Image Processing (ICIP), Thessaloniki, Greece, October 2001;    John Apostolopoulos, Wai-Tian Tan, Suise Wee, Gregory W. Womell, “Modeling Path Diversity for Multiple Description Video Communication”, ICASSP, May 2002;    John Apostolopoulos, Tina Wong, Wai-Tian Tan, Susie Wee, “On Multiple Description Streaming with Content Delivery Networks”, HP Labs, Palo Alto, February 2002;    John Apostolopoulos, Wai-Tian Tan, Susie J. Wee, “Video Streaming: Concepts, Algorithms and Systems”, HP Labs, Palo Alto, September 2002;    Rohit Puri, Kang-Won Lee, Kannan Ramchandran and Vaduvur Bharghavan. Forward Error Correction (FEC) Codes Based Multiple Description Coding for Internet Video Streaming and Multicast. Signal Processing: Image Communication, Vol. 16, No. 8, pp˜745-762, May 2001;    Rohit Puri and Kannan Ramchandran. Multiple Description Source Coding Through Forward Error Correction Codes In the Proceedings of the 33rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, Calif., October 1999;    Rohit Puri, Kang-Won Lee, Kannan Ramchandran and Vaduvur Bharghavan. Application of FEC based Multiple Description Coding to Internet Video Streaming and Multicast. Proceedings of the Packet Video 2000 Workshop, Forte Village Resort, Sardinia, Italy, May 2000.