HEVC (High Efficiency Video Coding) is the next generation video coding standard that is currently under development in standardization. HEVC will substantially improve coding efficiency compared to the state-of-the-art H.264/Advanced Video Coding (AVC). The initial focus of the HEVC development was on mono video, i.e. one camera view, at a fixed quality and bit rate, i.e. non-scalable. Now multi-layer extensions to the HEVC standard are under preparation, e.g. a scalable extension, a multi-view extension, and a 3D extension. Those extensions require multi-layer support. A HEVC bitstream without extensions can be considered as a single-layer bitstream, i.e. it represents the video in a single representation, e.g. a single video view, a single resolution and a single quality. In multi-layer extensions, a HEVC single-layer bitstream is typically included as a “base layer”. In multi-view or 3D extensions, additional layers may represent additional video views captured from different camera positions or, for instance, depth information. In scalability extensions, additional layers may represent the video in additional, typically higher, video picture resolutions, or higher pixel fidelity, or other color-spaces, or alike, providing improved video quality relative to the base layer.
Specific decoders are used to decode HEVC bitstreams with multiple layers, i.e. scalable or multi-view/3D HEVC decoders. In order to decode multi-layer bitstreams, information about decoding dependencies between layers is necessary. This information needs to be signaled in the bitstream. The information can also be used by network elements to identify layers that can be discarded from the transmission if bit rate adaptation, e.g. in case of network congestion, or format adaptation, e.g. in case a target device can only decode or display a certain maximum resolution, or 2D/3D adaptation, e.g. in case target device can only decode or display a certain number of views, is needed.
The dependency information in HEVC is typically obtained in so-called parameter sets, such as Picture Parameter Set (PPS), Sequence Parameter Set (SPS), or Video Parameter Set (VPS), together with other information. Typically, each parameter set is encapsulated in a Network Abstraction Layer (NAL) unit, i.e. a packet in the video bitstream. Since parameter sets contain information that is essential for decoding, they may be sent repeatedly in the bitstream, or be conveyed by “out-of-band” transmission, i.e. transmitted separately from the remaining bitstream, e.g. over a reliable connection. Such an out-of-band transmission can occur, for instance during session setup, e.g. using Session Description Protocol (SDP).
If parameter sets are sent at session start-up, the amount of data in the parameter set has an impact on the transmission duration and thus session start-up time. If parameter sets are sent “in-band”, i.e. in the bitstream, the size of parameter sets has an impact on the overall bitrate, and the impact is higher when the parameter sets are repeated in the bitstream for error resiliency reasons. For these reasons it is important that the information conveyed in the parameter sets is expressed in a compact way.
A document Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11 at 11th Meeting: Shanghai, Conn., 10-19 Oct. 2012 denoted JCTVC-K1007 relating to NAL unit header and parameter set designs for HEVC extensions includes specifications of parameter set designs for HEVC multi-view/3D and scalable coding extensions. According to that document, layer dependencies are signaled as part of the vps_extension syntax structure as indicated below:
vps_extension( ) {Descriptor ... for( i = 1; i ≦ vps_max_layers_minus1; i++ ) {  //layer dependency  num_direct_ref_layers[ i ]u(6)  for( j = 0; j < num_direct_ref_layers[ i ]; j++ )   ref_layer_id[ i ][ j ]u(6) }}num_direct_ref_layers[ i ] specifies the number of layers the i-th layer directly depends on.ref_layer_id[ i ][ j ] identifies the j-th layer the i-th layer directly depends on.
The above mentioned solution requires many bits to signal the layer dependencies in the VPS. In particular, for each layer above the base layer in use, six bits are used to code the number of reference layers and another six bits are used to identify each reference layer. This allows signaling dependencies for relevant cases, however it may be inefficient in terms of bit usage.