ITU-T Rec. H.265, entitled High Efficiency Video Coding, version 04/2013, (available from International Telecommunication Union (ITU), Place des Nations, 1211 Geneva 20, Switzerland, and incorporated herein by reference in its entirety), is referred herein as HEVC.
HEVC may be enhanced by a scalable extension known as SHVC (see JCT-VC-P1008, available from http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=8839, incorporated herein in its entirety.) SHVC, in contrast to SVC, may support coding multiple spatial or SNR enhancement layers in addition to a base layer in one scalable bitstream. Other extensions to H.265 may cover, for example, the multiview case.
HEVC and its extensions distinguish between a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). The VCL refers to those mechanisms and tools described in HEVC and its extensions that cover the bitstream syntax and decoding process of slices and syntax elements included in slices. The NAL refers to those mechanisms and tools conceptually above the syntax elements of slices. The term “layer” as used above is meant to distinguish conceptual parts of the HEVC standard document, and is not to be confused with “layer” in layered coding tools. Henceforth, when the term “layer” is used outside of the use of the two terms Network Abstraction Layer and Video Coding Layer, it is meant to be the layer as identified by a layer_id.
The term VCL conformance is also used herein and is meant such that the VCL data structures of a bitstream or parts thereof (including, for example all slices of a coded video bitstream included in NAL units with a given zero or non-zero nuh_layer_id) conform to a given profile, for example the main profile, and/or tier and/or level. However, for VCL conformance, it is not required that the syntax elements syntactically above the slice layer (belonging to the Network Abstraction Layer; syntax elements like parameter sets, nal unit header, and similar) are conforming to the profile in question. For example, a single layer bitstream that would be main profile conformant but for the fact that the nuh_layer_id of all its NAL units is equal to 1 would be VCL conformant with the main profile, even if it is not fully conformant due to the fact that the nuh_layer_id is equal to 1, while the main profile requires the nuh_layer_id to be equal to zero.
The term VCL conformance can be considered practical, for example, because the majority of the computational complexity of a decoder can lie in the decoding of the VCL syntax elements. In practice, at least some hardware implementations implement large parts or the complete VCL decoding process in dedicated hardware, microcode, ROM, and similar “hard coded” techniques, whereas NAL based mechanisms are implemented in software on an embedded or external general purpose processor. At least for such architectures, it can be of interest to have VCL-conformant bitstreams (or parts of bitstreams, such as auxiliary output layer sets) clearly identified and VCL-conformance for them established.
Certain video coding standards such as SHVC and/or other extensions of HEVC may include a feature known as an “auxiliary picture”. An auxiliary picture is a coded picture (or depending on context, a sequence of coded pictures, or a plurality of layers in an output layer set, where each layer includes a sequence of coded pictures) that may not be intended for display directly, but is rather intended as control information for the display process and other similar tasks. Example applications for auxiliary pictures include close captioning, overlay, alpha maps, and similar. Another application example for an auxiliary picture is the coding of a second representation of the same content, but potentially at a different resolution/fidelity/bitrate . . . in the same bitstream and potentially with the timing and prediction properties similar to the primary picture. In that case, the auxiliary picture may be intended for display, but in lieu of the primary coded picture or primary output layer set as described below. Whether or not such an auxiliary picture would be called such or called a simulcast picture is an issue of definition. When mentioning auxiliary pictures henceforth, simulcast pictures in the aforementioned sense are meant to be included.
The feature of an auxiliary picture was introduced into standardization, for example, in the form of alpha maps, in H.264/SVC.
As auxiliary pictures need to be decoded, a conformance point for the auxiliary picture(s) (for example in the form of profile/tier/level) may need to be established. Without such an established conformance point, a decoder may not know whether it can decode the picture, a system cannot decide whether to accept a bitstream containing the auxiliary picture(s) for decoding, and other unwanted consequences. The rationale here can be the same as for establishing conformance points for primary coded pictures, which are known in the art.
In the terminology of HEVC and its extensions, an auxiliary picture can be included in an output layer set. In SHVC, for example, a scalable bitstream can contain more than one output layer set. The output layer set containing a layer with the layer_id equal to zero is the output layer set that is primarily intended for display, and is referred to henceforth as primary output layer set. Output layer sets containing auxiliary pictures, which have layer_id not equal to zero, are referred to as auxiliary output layer set(s).
ITU-T document JCTVC-Q0078 addresses aspects of conformance signalling of auxiliary pictures. Several components were included as part of this document, including the following, as expressed in the High Level Syntax BoG Report JCTVC-Q0223:
JCTVC-Q0078: mechanism for signaling a profile/tier/level conformance point for auxiliary pictures, including the aspects listed below.
signaling of additional layer sets
description of how profile_tier_level applies
define normative rewriting process
VPS rewriting SEI message
output layer set nesting SEI message
The design outlined in JCTVC-Q0078 and its implementation can be highly complex. Further, it may not provide a mechanism to signal conformance of an output layer set containing multiple auxiliary picture layers within the same bitstream, which use inter-layer prediction. It also relies on the use of SEI messages that, in at least some implementations may be ignored by the decoder or removed in the transmission path, which may render any operation of auxiliary pictures unreliable).