1. Technical Field
The present disclosure relates to encoding and decoding techniques.
This disclosure was developed with specific attention paid to its possible use in encoding and/or decoding a video sequence comprised of digital samples.
2. Description of the Related Art
A well established paradigm in encoding a digital video signal is based on the layout illustrated in FIG. 1.
There, an input (“original”) video signal Video In is encoded in an encoder to be then transmitted over a channel or stored in a storage medium to be eventually decoded in a decoder and reproduced as a Video Out signal.
FIG. 2 is representative of a Layered Coding (LC) arrangement wherein an original input video signal (Original Video) is additionally subjected to spatial and/or time subsampling in a downsample filter to produce a number N of downsampled video sequences. These downsampled sequences represent multiple descriptions (MD) of the original video signal. These multiple descriptions are then encoded in a bitstream including a plurality of layers each containing one of the representations of the original signal. The various layers are ordered and encoded in such a way that the layers 0 to i−1 are used as a prediction for encoding the i-th layer.
Specifically, FIG. 2 emphasizes how in the encoding process the original video signal and the downsampled version thereof may be subjected to two parallel encoding processes one of which is hierarchically dependent on the other e.g. via inter-layer prediction. The block diagram of FIG. 2 is exemplary of a “higher” representation being encoded having reference to the “lower” representation. Consequently, the two signals are encoded via two encoders that are similar to each other, and additionally, encoding the “dependent” representation may re-use certain encoding elements from the “independent” representation.
The two encoder blocks of FIG. 2 plus the inter-layer prediction block and the MUX element may in fact be the building blocks of a single “layered” encoder.
FIG. 2 refers to two representations, but a hierarchy including a generic number of representations may be considered, where the lower hierarchical level (level n=0) designated the “base” layer (BL) and each upper layer (level n>0) represents an “enhancement” layer (EL) with respect to the preceding layers in the hierarchy from which it depends.
During the decoding process, the i-th layer of the bitstream can be decoded starting from the results of decoding the previous layers. Increasing the number of layers in the bitstream increases the fidelity in reproducing the original signal form the signals being decoded.
Scalable Video Coding (SVC) as provided by the ITU-T/MPEG standards (ITU-T Rec. H.264/ISO 14496-10 AVC, Annex G “Scalable Video Coding”) is exemplary of layered coding which extends the H.264/AVC standards by means of a layered encoding process which enables spatial, time and quality scaling of the decoded signal.
FIG. 3 herein is representative of a SVC processing layout including one base layer and two enhancement layers. Each layer includes a temporal decomposition step followed by motion/texture coding and entropy coding. The original video sequence is fed directly to the input of the Enhancement Layer 2 and via cascaded 2D decimation operations to the Enhancement Layer 1 and the Base Layer. The output of the various layers (Base Layer, Enhancement Layers 1 and 2) are multiplexed to generate the output SVC encoded bitstream.
In another encoding/decoding paradigm, known as Multiple Description Coding (MDC) as schematically represented in FIG. 4, the original video signal representing the input of the encoding process is subsampled in a multiple description (MD) filter to produce N different multiple descriptions. Each of these descriptions is then independently encoded in an encoder. The encoded descriptions are multiplexed to generate a bitstream to be transmitted and/or stored.
In the decoding process, the fidelity of the signal decoded (i.e. reproduced) to the original signal increases with an increasing number of descriptions that are received and decoded. The block diagram of FIG. 4 represents MDC encoding to descriptions MD1 and MD2.
Advantages of layered coding (LC) over multiple description coding (MDC) are:                a greater efficiency in signal compression; and        a higher flexibility in adapting the decoded signal (scalability).        
Advantages of MDC over LC are:                a higher “robustness” with respect to errors, in case of transmission over a noisy channel; and        a higher transmission efficiency in case of peer-to-peer (P2P) applications.        
The article by A. Vitali et al. “Video Over IP Using Standard-Compatible Multiple Description Coding: an EPF Proposal”—Proceedings of the 2006 Packet Video Workshop, Hangzhou, China, provides a detailed review of LC and MDC.
Internet Protocol TeleVision (IPTV) is a digital TV service provided using the IP protocol over a wideband network infrastructure such as the Internet.
IPTV is becoming one of the most significant applications within the framework of digital video technology. Projects aiming at producing IPTV set-top boxes for receiving High Definition TV (HDTV) over IP and using the 802.11n standard are currently in progress.
FIG. 5 is a schematic representation of an exemplary IPTV scenario including one or more video servers distributing their programs to final users (home users) via hubs receiving the programs from one or more head end hubs.
A feature of IPTV is the Video On Demand (VOD) capability, which permits any user in the system to access at any time a given TV content. At a given time instant, each user may notionally access a different content, whereby conventional point-to-point multicast transmission of encoded contents (left-hand side of FIG. 6) would require a very large bandwidth. For that reason, IPTV may resort to peer-to-peer (P2P) transmission protocols (right hand side of FIG. 6) in order to permit users to exchange their contents thus relieving the provider from the task of individually sending a given content to each and every user that has requested it.
Recent research in the area of P2P protocols demonstrates that MDC encoding can greatly improve efficiency of such a distribution system for multimedia contents. By resorting to MDC, users may exchange different alternative representations of the original system, thus increasing the efficiency of connections between peers within the P2P network. The various representations received may be eventually re-composed to reconstruct the original signal with an increasing quality as a function of the number of the descriptions that are received.
FIG. 7 herein schematically shows how a “fast” peer (i.e. a peer having a bandwidth available which is larger than the bandwidth available to other peers) may connect to various “slow” peers to unload therefrom alternative multiple descriptions which are then re-composed. Specifically, FIG. 7 refers to a sequence of five images (P0, . . . , P4), which is de-composed in four different descriptions (D0, . . . , D3) each of which is represented with a different degree of shading.
Another useful feature for IPTV is adaptability of the content to the terminal, so that the digital video signal received can be effectively reproduced on different types of terminals adapted to be connected to an IPTV system, such as High Definition TV (HDTV) receivers, conventional Standard Definition TV (SDTV) receivers, PC desktops, PC laptops, PDAs, smart phones, IPods, and so on.
FIG. 8 herein is representative of the scalability concept based on which an original video sequence may be converted into an encoded bitstream which is “scalable”, i.e. may be reproduced after possible scaling in terms of “spatial” scalability (that is with images reproduced on a wider or smaller scale in terms of size/number of pixels reproduced), “quality” scalability (important with a change of resolution) and/or “temporal” scalability (e.g. as “slow” video).