The present invention is concerned with scalable data streams such as scalable video streams and network entities dealing with such scalable data streams such as, for example, a decoder or MANE (media-aware network element).
From a transport perspective, adaptation of video bitstreams in temporal or other dimensions is highly desirable, as was already identified and addressed within the standardization of H.264/AVC. The encapsulation of video data into Network Abstraction Layer (NAL) units and the design decision to signal many important but rather invariant parameters outside the video bitstream in so called Parameter Sets reflect this understanding. The Scalable Video Coding (SVC) and Multiview Video Coding (MVC) extensions of H.264/AVC allow for adaptation beyond the temporal dimension but a lack of extensibility in the base specification of H.264/AVC led to complicated approaches to extend the H.264/AVC high level syntax in a backward compatible way. In H.264/AVC and its extensions the signaling of operation points along the scalability dimensions is done by means of NAL units header extensions that include an extra byte for this purpose. The same information is provided by the so-called prefix NAL units for NAL units that contain pure H.264/AVC video data and constitute the so-called base layer. A mechanism to extend the information provided via Sequence Parameter Set (SPS) of the base layer for the enhanced layers, coded by the extensions of H.264/AVC, is established via so-called Subset Sequence Parameter Sets (SSPS).
While the development of the HEVC base specification is still ongoing, efforts towards a 3D video coding extension are already taken in order to assure an extensible syntax in the base specification in the beginning. These so-called scalable hooks in the base specification need to be designed carefully to be future-proof. The following paragraphs give an overview on the current status of the HEVC High Level (HL) syntax and the concepts that are under discussion at the moment.
The current status of the HEVC standardization is as follows:
During the ongoing HEVC base specification and the 3D extensions, numerous participants made proposals on how to proceed from the HL syntax as specified in H.264/AVC. The outcome is reflected in the current working draft of the specification and the numerous contributions of the individual participants. The following paragraphs give an overview on the current discussion.
As stated above, the signalization of operation points within the scalability dimensions of an SVC or MVC video bitstream necessitates (specific) extensions of the H.264/AVC NAL unit headers. This is regarded as an unclean solution, causing extra effort e.g. for parsing multiple different NAL unit header structures and necessitating prefix NAL units to signal the base layer. Therefore, an effort was made to ensure that the base HEVC NAL unit header syntax is versatile enough to satisfy the needs of the future extensions of the base specification.
In the syntax of a NAL unit as in the current working draft, a current consensus was to use a two byte NAL unit header. In the first byte, nal_ref_flag is signaled with on bit opposed to the two bits of nal_ref_idc in H.264/AVC, as this HL feature has not been widely used in applications. The syntax element nal_unit_type therefore has one more bit to signal the type of the NAL unit, which allows for a total of 64 distinguishable types.
The second byte of the NAL unit header is divided into two parts, where 1c bits are used to signal temporal_id of the NAL unit as temporal scalability is already enabled in the base specification. The remaining 5 bits of the second byte are reserved to be equal to one within a HEVC conforming bitstream. The current understanding of the usage of the remaining 5 bits is that they can be used to signal scalability identifiers in future extensions, e.g. for a layer_id syntax element.
While Picture Parameter Set (PPS) and Sequence Parameter Set as defined in the current HEVC base specification are relatively similar to what has formerly been specified in H.264/AVC, two new Parameter Sets, referred to as the Adaptation Parameter Set (APS) and Video Parameter Set (VPS), have been introduced to HEVC of which only the VPS is relevant for the content of this document.
The Video Parameter Set was supposed to signal parameters such as the number of (e.g. temporal) level/layer present in the video bitstream and the profile and level for all operation points within. Other parameters to be signaled include the dependencies between scalable layers, much as it is signaled in the SVC scalability information SEI messages.
An additional brief explanation is presented below with regard to the semantics of the NAL unit and Video Parameter Set syntax.
profile_idc and level_idc indicate the profile and level to which the coded video sequence conforms.
max_temporal_layers_minus1+1 specifies the maximum number of temporal layers present in the sequence. The value of max_temporal_layers_minus1 shall be in the range of 0 to 7, inclusive.
more_rbsp_data( ) is specified as follows.                If there is no more data in the RBSP, the return value of more_rbsp_data( ) is equal to FALSE.        Otherwise, the RBSP data is searched for the last (least significant, right-most) bit equal to 1 that is present in the RBSP. Given the position of this bit, which is the first bit (rbsp_stop_one_bit) of the rbsp_trailing_bits( ) syntax structure, the following applies.        If there is more data in an RBSP before the rbsp_trailing_bits( ) syntax structure, the return value of more_rbsp_data( ) is equal to TRUE.        Otherwise, the return value of more_rbsp_data( ) is equal to FALSE.        
nal_ref_flag equal to 1 specifies that the content of the NAL unit contains a sequence parameter set, a picture parameter set, an adaptation parameter set or a slice of a reference picture.
For coded video sequences conforming to one or more of the profiles specified in Annex 10 that are decoded using the decoding process specified in clauses 2-9, nal_ref_flag equal to 0 for a NAL unit containing a slice indicates that the slice is part of a non-reference picture.
nal_ref_flag shall be equal to 1 for sequence parameter set, picture parameter set or adaptation parameter set NAL units. When nal_ref_flag is equal to 0 for one NAL unit with nal_unit_type equal to 1 or 4 of a particular picture, it shall be equal to 0 for all NAL units with nal_unit_type equal to 1 or 4 of the picture.
nal_ref_flag shall be equal to 1 for NAL units with nal_unit_type equal to 5.
nal_ref_flag shall be equal to 0 for all NAL units having nal_unit_type equal to 6, 9, 10, 11, or 12.
nal_unit_type specifies the type of RBSP data structure contained in the NAL unit as specified in Table 1.
Decoders shall ignore (remove from the bitstream and discard) the contents of all NAL units that use reserved values of nal_unit_type.
TABLE 1possible (not exhaustive list of) NAL unit typecodes and NAL unit type classesContent of NAL unit and RBSP syntaxNALnal_unit_typestructureunit type class0Unspecifiednon-VCL1Coded slice of non-IDR, non-CRAVCLand non-TLA pictureslice_layer_rbsp( )2Reservedn/a3Coded slice of TLA pictureVCLslice_layer_rbsp( )4Coded slice of a CRA pictureVCLslice_layer_rbsp( )5Coded slice of an IDR pictureVCLslice_layer_rbsp( )6Supplemental enhancementnon-VCLinformation (SEI) sei_rbsp( )7Sequence parameter setnon-VCLseq_parameter_set_rbsp( )8Picture parameter setnon-VCLpic_parameter_set_rbsp( )9Access unit delimiternon-VCLaccess_unit_delimiter_rbsp( )10-11Reservedn/a12Filler datanon-VCLfiller data rbsp( )13Reservedn/a14Adaption parameter setnon-VCLaps rbsp( )15-23Reservedn/a24 . . . 63I Unspecifiednon-VCL
A “profile” is a subset of the entire bitstream syntax. Within the bounds imposed by the syntax of a given profile it is still possible to necessitate a very large variation in the performance of encoders and decoders depending upon the values taken by syntax elements in the bitstream such as the specified size of the decoded pictures. In many applications, it is currently neither practical nor economic to implement a decoder capable of dealing with all hypothetical uses of the syntax within a particular profile.
In order to deal with this problem, “levels” are specified within each profile. A level is a specified set of constraints imposed on values of the syntax elements in the bitstream. These constraints may be simple limits on values. Alternatively they may take the form of constraints on arithmetic combinations of values (e.g., picture width multiplied by picture height multiplied by number of pictures decoded per second).
level: A defined set of constraints on the values that may be taken by the syntax elements and variables. The same set of levels is defined for all profiles, with most aspects of the definition of each level being in common across different profiles. Individual implementations may, within specified constraints, support a different level for each supported profile. In a different context, level is the value of a transform coefficient prior to scaling.
profile: A specified subset of the syntax.
In the development of the 3D video coding extension for HEVC, there has also been the proposal to shift certain parameters from the slice header to the Access Unit Delimiter (AUD), a NAL unit that optionally stands in the beginning of a new Access Unit (AU) m H.264/AVC.
Another proposal during the course of the specification of the HEVC 3D video coding extension is to signal the dependencies between operation points by means of indirection between SPSs. The NAL unit header is supposed to carry a reference to a SPS and within each SPS is a reference to the relative base SPS. This (potentially cascaded) indirection has to be resolved until the SPS for the lowest (temporal, . . . ) level. Such an approach puts a high burden on devices such as a MANE to look deep into the bitstream and hold a substantial amount of information available for the purpose of identifying operation points.
In any case, it would still be favorable to have a solution at hand which facilitates, or renders more efficient, the handling of scalable data streams by network entities.