Video coding encompasses techniques where a series of uncompressed pictures is converted into a compressed, video bitstream. Video decoding refers to the inverse process. Standards exist that specify certain techniques for image and video decoding operations, such as ITU-T Rec. H.264 “Advanced video coding for generic audiovisual services”, 03/2010, and ITU-T Rec. H.265 “High Efficiency Video Coding”, April 2013, both available from the International Telecommunication Union (“ITU”), Place de Nations, CH-1211 Geneva 20, Switzerland or http://www.itu.int/rec/T-REC-H.264 and http://www.itu.int/rec/T-REC-H.265, respectively, and both of which are incorporated herein by reference in their entirety. H.265 is also known as HEVC.
Layered video coding, also known as scalable video coding, refers to video coding techniques in which the video bitstream can be separated into two or more sub-bitstreams, called layers. Layers can form a hierarchy, where a base layer can be decoded independently, and enhancement layers can be decoded in conjunction with the base layer and/or lower enhancement layers. HEVC is planned to include a scalable variant, informally known as Scalable High efficiency Video Coding or SHVC, of which a draft (abbreviated: SHVC-WD1) can be found as JCT-VC-L1008, available from http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7279, which is incorporated by reference in its entirety.
SHVC can use inter layer prediction to increase the coding efficiency of enhancement layer(s) by exploiting the redundancy present between the base layer and the enhancement layer. Certain multiview systems can do the same for inter-view prediction. In SHVC, temporal enhancement layers are known as temporal sub-layers not layers. The basic principle of inter-layer prediction in scalable video coding schemes is well understood by a person skilled in the art. In SHVC-WD1, inter-layer prediction for scalability (in contrast to multiview) can be performed by inserting a single (potentially upsampled) predictor reference picture (including some of its meta-data, such as motion vectors) into one or more reference picture list(s) maintained by the spatial or SNR enhancement layer encoder or decoder. An encoder can make use of this inter-layer predictor picture just as of any other reference picture. A decoder uses the predictor when so indicated in the bitstream, just as it uses other predictors when so indicated.
Referring to FIG. 1, shown is a layering structure containing a picture of a base layer (101) and pictures of two enhancement layers (103) and (105). In SHVC, those enhancement layer pictures may be quality/SNR scalable enhancement layers or spatial enhancement layers. In other scenarios, they can be different views of a multiview system. Potential inter-layer prediction is depicted by solid arrows. The enhancement layer picture (105), belonging to the highest enhancement layer, when using inter-layer prediction, may use as an inter-layer predictor (104) information (such as the (upsampled) reference picture(s) itself and associated meta information such as motion vectors) of the closest reference layer picture which, in this case, is enhancement layer picture (103). Enhancement layer picture (103) can use as its inter-layer predictor (102) information from the base layer picture (101). According to SHVC-WD1, enhancement layer picture (105) cannot use base layer picture (101) information directly as a prediction reference.