HEVC [2] allows for different means of High Level Syntax signaling to the application layer. Such means are the NAL unit header, Parameter Sets and Supplemental Enhancement Information (SEI) Messages. The latter are not used in the decoding process. Other means of High Level Syntax signaling originate from respective transport protocol specifications such as MPEG2 Transport Protocol [3] or the Realtime Transport Protocol [4], and its payload specific specifications, for example the recommendations for H.264/AVC [5], scalable video coding (SVC) [6] or HEVC [7]. Such transports protocols may introduce High Level signaling that employs similar structures and mechanism as the High Level signaling of the respective application layer codec spec, e.g. HEVC [2]. One example of such signaling is the Payload Content Scalability Information (PACSI) NAL unit as described in [6] that provides supplementary information for the transport layer.
For parameter sets, HEVC includes Video Parameter Set (VPS), which compiles most important stream information to be used by the application layer at a single and central location. In earlier approaches, this information needed to be gathered from multiple Parameter Sets and NAL unit headers.
Prior to the present application, the status of the standard with respect to Coded Picture Buffer (CPB) operations of Hypothetical Reference Decoder (HRD), and all related syntax provided in Sequence Parameter Set (SPS)/Video Usability Information (VUI), Picture Timing SEI, Buffering Period SEI as well as the definition of the decoding unit, describing a sub-picture and the syntax of the Dependent Slices as present in the slice header as well as the Picture Parameter Set (PPS), were as follows.
In order to allow for low delay CPB operation on sub-picture level, sub-picture CPB operations have been proposed and integrated into the HEVC draft standard 7 JCTVC-I1003 [2]. Here especially, the decoding unit has been defined in section 3 of [2] as:
decoding unit: An access unit or a subset of an access unit. If SubPicCpbFlag is equal to 0, a decoding unit is an access unit. Otherwise, a decoding unit consists of one or more VCL NAL units in an access unit and the associated non-VCL NAL units. For the first VCL NAL unit in an access unit, the associated non-VCL NAL units are and the filler data NAL units, if any, immediately following the first VCL NAL unit and all non-VCL NAL units in the access unit that precede the first VCL NAL unit. For a VCL NAL unit that is not the first VCL NAL unit in an access unit, the associated non-VCL NAL units are the filler data NAL unit, if any, immediately following the VCL NAL unit.
In the standard defined up to that time, the “Timing of decoding unit removal and decoding of decoding unit” has been described and added to Annex C “Hypothetical reference decoder”. In order to signal sub-picture timing, the buffering period SEI message and the picture timing SEI message, as well as the HRD parameters in the VUI have been extended to support decoding units, as sub-picture units.
Buffering period SEI message syntax of [2] is shown in FIG. 1.
When NalHrdBpPresentFlag or VclHrdBpPresentFlag are equal to 1, a buffering period SEI message can be associated with any access unit in the bitstream, and a buffering period SEI message shall be associated with each RAP access unit, and with each access unit associated with a recovery point SEI message.
For some applications, the frequent presence of a buffering period SEI message may be desirable.
A buffering period was specified as the set of access units between two instances of the buffering period SEI message in decoding order.
The semantics were as follows:
seq_parameter_set_id specifies the sequence parameter set that contains the sequence HRD attributes. The value of seq_parameter_set_id shall be equal to the value of seq_parameter_set_id in the picture parameter set referenced by the primary coded picture associated with the buffering period SEI message. The value of seq_parameter_set_id shall be in the range of 0 to 31, inclusive.
rap_cpb_params_present_flag equal to 1 specifies the presence of the initial_alt_cpb_removal_delay[SchedSelIdx] and initial_alt_cpb_removal_delay_offset[SchedSelIdx] syntax elements. When not present, the value of rap_cpb_params_present_flag is inferred to be equal to 0. When the associated picture is neither a CRA picture nor a BLA picture, the value of rap_cpb_params_present_flag shall be equal to 0.
initial_cpb_removal_delay[ SchedSelIdx] and initial_alt_cpb_removal_delay[SchedSelIdx] specify the initial CPB removal delays for the SchedSelIdx-th CPB. The syntax elements have a length in bits given by initial_cpb_removal_delay_length_minus1+1, and are in units of a 90 kHz clock. The values of the syntax elements shall not be equal to 0 and shall not exceed 90000*(CpbSize[SchedSelIdx]+BitRate[SchedSelIdx]), the time-equivalent of the CPB size in 90 kHz clock units.
initial_cpb_removal_delay_offset[SchedSelIdx] and initial_alt_cpb_removal_delay_offset[SchedSelIdx] are used for the SchedSelIdx-th CPB to specify the initial delivery time of coded data units to the CPB. The syntax elements have a length in bits given by initial_cpb_removal_delay_length_minus1+1 and are in units of a 90 kHz clock. These syntax elements are not used by decoders and may be needed only for the delivery scheduler (HSS).
Over the entire coded video sequence, the sum of initial_cpb_removal_delay[SchedSelIdx] and initial_cpb_removal_delay_offset[SchedSelIdx] shall be constant for each value of SchedSelIdx, and the sum of initial_alt_cpb_removal_delay[SchedSelIdx] and initial_alt_cpb_removal_delay_offset[SchedSelIdx] shall be constant for each value of SchedSelIdx.
The picture timing SEI message syntax of [2] is shown in FIG. 2.
The syntax of the picture timing SEI message was dependent on the content of the sequence parameter set that is active for the coded picture associated with the picture timing SEI message. However, unless the picture timing SEI message of an IDR or BLA access unit is preceded by a buffering period SEI message within the same access unit, the activation of the associated sequence parameter set (and, for IDR or BLA pictures that are not the first picture in the bitstream, the determination that the coded picture is an IDR picture or a BLA picture) does not occur until the decoding of the first coded slice NAL unit of the coded picture. Since the coded slice NAL unit of the coded picture follows the picture timing SEI message in NAL unit order, there may be cases in which it is useful for a decoder to store the RBSP containing the picture timing SEI message until determining the parameters of the sequence parameter that will be active for the coded picture, and then perform the parsing of the picture timing SEI message.
The presence of picture timing SEI message in the bitstream was specified as follows.                If CpbDpbDelaysPresentFlag is equal to 1, one picture timing SEI message shall be present in every access unit of the coded video sequence.        Otherwise (CpbDpbDelaysPresentFlag is equal to 0), no picture timing SEI messages shall be present in any access unit of the coded video sequence.        
The semantics were defined as follows:
cpb_removal_delay specifies how many clock ticks to wait after removal from the CPB of the access unit associated with the most recent buffering period SEI message in a preceding access unit before removing from the buffer the access unit data associated with the picture timing SEI message. This value is also used to calculate an earliest possible time of arrival of access unit data into the CPB for the HSS. The syntax element is a fixed length code whose length in bits is given by cpb_removal_delay_length_minus1+1. The cpb_removal_delay is the remainder of a modulo 2(cpb_removal_delay_length_minus1+1) counter.
The value of cpb_removal_delay_length_minus1 that determines the length (in bits) of the syntax element cpb_removal_delay is the value of cpb_removal_delay_length_minus1 coded in the sequence parameter set that is active for the primary coded picture associated with the picture timing SEI message, although cpb_removal_delay specifies a number of clock ticks relative to the removal time of the preceding access unit containing a buffering period SEI message, which may be an access unit of a different coded video sequence.
dpb_output_delay is used to compute the DPB output time of the picture. It specifies how many clock ticks to wait after removal of the last decoding unit in an access unit from the CPB before the decoded picture is output from the DPB.
A picture is not removed from the DPB at its output time when it is still marked as “used for short-term reference” or “used for long-term reference”.
Only one dpb_output_delay is specified for a decoded picture.
The length of the syntax element dpb_output_delay is given in bits by dpb_output_delay_length_minus1+1. When sps_max_dec_pic_buffering[max_temporal_layers_minus1] is equal to 0, dpb_output_delay shall be equal to 0.
The output time derived from the dpb_output_delay of any picture that is output from an output timing conforming decoder shall precede the output time derived from the dpb_output_delay of all pictures in any subsequent coded video sequence in decoding order.
The picture output order established by the values of this syntax element shall be the same order as established by the values of PicOrderCntVal.
For pictures that are not output by the “bumping” process because they precede, in decoding order, an IDR or BLA picture with no_output_of prior_pics_flag equal to 1 or inferred to be equal to 1, the output times derived from dpb_output_delay shall be increasing with increasing value of PicOrderCntVal relative to all pictures within the same coded video sequence.
num_decoding_units_minus1 plus 1 specifies the number of decoding units in the access unit the picture timing SEI message is associated with. The value of num_decoding_units_minus1 shall be in the range of 0 to PicWidthInCtbs*PicHeightInCtbs−1, inclusive.
num_nalus_in_du_minus1[i] plus 1 specifies the number of NAL units in the i-th decoding unit of the access unit the picture timing SEI message is associated with. The value of num_nalus_in_du_minus1[i] shall be in the range of 0 to PicWidthInCtbs*PicHeightInCtbs −1, inclusive.
The first decoding unit of the access unit consists of the first num_nalus_in_du_minus1 [0]+1 consecutive NAL units in decoding order in the access unit. The i-th (with i greater than 0) decoding unit of the access unit consists of the num_nalus_in_du_minus1 [i]+1 consecutive NAL units immediately following the last NAL unit in the previous decoding unit of the access unit, in decoding order. There shall be at least one VCL NAL unit in each decoding unit. All non-VCL NAL units associated with a VCL NAL unit shall be included in the same decoding unit.
du_cpb_removal_delay[i] specifies how many sub-picture clock ticks to wait after removal from the CPB of the first decoding unit in the access unit associated with the most recent buffering period SEI message in a preceding access unit before removing from the CPB the i-th decoding unit in the access unit associated with the picture timing SEI message. This value is also used to calculate an earliest possible time of arrival of decoding unit data into the CPB for the HSS. The syntax element is a fixed length code whose length in bits is given by cpb_removal_delay_length_minus1+1. The du_cpb_removal_delay[i] is the remainder of a modulo 2(cpb_removal_delay_length_minus1+1) counter.
The value of cpb_removal_delay_length_minus1 that determines the length (in bits) of the syntax element du_cpb_removal_delay[i] is the value of cpb_removal_delay_length_minus1 coded in the sequence parameter set that is active for the coded picture associated with the picture timing SEI message, although du_cpb_removal_delay[i] specifies a number of sub-picture clock ticks relative to the removal time of the first decoding unit in the preceding access unit containing a buffering period SEI message, which may be an access unit of a different coded video sequence.
Some information was contained in the VUI syntax of [2]. The VUI parameters syntax of [2] is shown in FIGS. 3A and 3B. The HRD parameters syntax of [2] is shown in FIG. 4. The semantics were defined as follows:
sub_pic_cpb_params_present_flag equal to 1 specifies that sub-picture level CPB removal delay parameters are present and the CPB may operate at access unit level or sub-picture level. sub_pic_cpb_params_present_flag equal to 0 specifies that sub-picture level CPB removal delay parameters are not present and the CPB operates at access unit level. When sub_pic_cpb_params_present_flag is not present, its value is inferred to be equal to 0.
num_units_in_sub_tick is the number of time units of a clock operating at the frequency time_scale Hz that corresponds to one increment (called a sub-picture clock tick) of a sub-picture clock tick counter. num_units_in_sub_tick shall be greater than 0. A sub-picture clock tick is the minimum interval of time that can be represented in the coded data when sub_pic_cpb_params_present_flag is equal to 1.
tiles_fixed_structure_flag equal to 1 indicates that each picture parameter set that is active in the coded video sequence has the same value of the syntax elements num_tile_columns_minus1, num_tile_rows_minus1, uniform_spacing_flag, column_width[i], row_height[i] and loop_filter_across_tiles_enabled_flag, when present. tiles_fixed_structure_flag equal to 0 indicates that tiles syntax elements in different picture parameter sets may or may not have the same value. When the tiles_fixed_structure_flag syntax element is not present, it is inferred to be equal to 0.
The signaling of tiles_fixed_structure_flag equal to 1 is a guarantee to a decoder that each picture in the coded video sequence has the same number of tiles distributed in the same way which might be useful for workload allocation in the case of multi-threaded decoding.
Filler data of [2] was signaled using filter data RBSP syntax shown in FIG. 5.
The hypothetical reference decoder of [2] used to check bitstream and decoder conformance was defined as follows:
Two types of bitstreams are subject to HRD conformance checking for this Recommendation|International Standard. The first such type of bitstream, called Type I bitstream, is a NAL unit stream containing only the VCL NAL units and filler data NAL units for all access units in the bitstream. The second type of bitstream, called a Type II bitstream, contains, in addition to the VCL NAL units and filler data NAL units for all access units in the bitstream, at least one of the following:                additional non-VCL NAL units other than filler data NAL units,        all leading_zero_8 bits, zero_byte, start_code_prefix_one_3 bytes, and trailing_zero_8 bits syntax elements that form a byte stream from the NAL unit stream.        
FIG. 6 shows the types of bitstream conformance points checked by the HRD of [2].
Two types of HRD parameter sets (NAL HRD parameters and VCL HRD parameters) are used. The HRD parameter sets are signaled through video usability information, which is part of the sequence parameter set syntax structure.
All sequence parameter sets and picture parameter sets referred to in the VCL NAL units, and corresponding buffering period and picture timing SEI messages shall be conveyed to the HRD, in a timely manner, either in the bitstream, or by other means.
The specification for “presence” of non-VCL NAL units is also satisfied when those NAL units (or just some of them) are conveyed to decoders (or to the HRD) by other means not specified by this Recommendation|International Standard. For the purpose of counting bits, only the appropriate bits that are actually present in the bitstream are counted.
As an example, synchronization of a non-VCL NAL unit, conveyed by means other than presence in the bitstream, with the NAL units that are present in the bitstream, can be achieved by indicating two points in the bitstream, between which the non-VCL NAL unit would have been present in the bitstream, had the encoder decided to convey it in the bitstream.
When the content of a non-VCL NAL unit is conveyed for the application by some means other than presence within the bitstream, the representation of the content of the non-VCL NAL unit is not required to use the same syntax specified in this annex.
Note that when HRD information is contained within the bitstream, it is possible to verify the conformance of a bitstream to the requirements of this subclause based solely on information contained in the bitstream. When the HRD information is not present in the bitstream, as is the case for all “stand-alone” Type I bitstreams, conformance can only be verified when the HRD data is supplied by some other means not specified in this Recommendation|International Standard.
The HRD contains a coded picture buffer (CPB), an instantaneous decoding process, a decoded picture buffer (DPB), and output cropping as shown in FIG. 7.
The CPB size (number of bits) is CpbSize[SchedSelIdx]. The DPB size (number of picture storage buffers) for temporal layer X is sps_max_dec_pic_buffering[X] for each X in the range of 0 to sps_max_temporal_layers_minus1, inclusive.
The variable SubPicCpbPreferredFlag is either specified by external means, or when not specified by external means, set to 0.
The variable SubPicCpbFlag is derived as follows:SubPicCpbFlag=SubPicCpbPreferredFlag && sub_pic_cpb_params_present_flag
If SubPicCpbFlag is equal to 0, the CPB operates at access unit level and each decoding unit is an access unit. Otherwise the CPB operates at sub-picture level and each decoding unit is a subset of an access unit.
The HRD operates as follows. Data associated with decoding units that flow into the CPB according to a specified arrival schedule are delivered by the HSS. The data associated with each decoding unit are removed and decoded instantaneously by the instantaneous decoding process at CPB removal times. Each decoded picture is placed in the DPB. A decoded picture is removed from the DPB at the later of the DPB output time or the time that it becomes no longer needed for inter-prediction reference.
The HRD is initialized as specified by the buffering period SEI. The removal timing of decoding units from the CPB and output timing of decoded pictures from the DPB are specified in the picture timing SEI message. All timing information relating to a specific decoding unit shall arrive prior to the CPB removal time of the decoding unit.
The HRD is used to check conformance of bitstreams and decoders.
While conformance is guaranteed under the assumption that all frame-rates and clocks used to generate the bitstream match exactly the values signaled in the bitstream, in a real system each of these may vary from the signaled or specified value.
All the arithmetic is done with real values, so that no rounding errors can propagate. For example, the number of bits in a CPB just prior to or after removal of a decoding unit is not necessarily an integer.
The variable tc is derived as follows and is called a clock tick:tc=num_units_in_tick÷time_scale
The variable tc_sub is derived as follows and is called a sub-picture clock tick:tc—sub=num_units_in_sub_tick÷time_scale
The following is specified for expressing the constraints:                Let access unit n be the n-th access unit in decoding order with the first access unit being access unit 0.        Let picture n be the coded picture or the decoded picture of access unit n.        Let decoding unit m be the m-th decoding unit in decoding order with the first decoding unit being decoding unit 0.        
In [2], the slice header syntax allowed for so-called dependent slices.
FIG. 8 shows the slice header syntax of [2].
Slice header semantics were defined as follows:
dependent_slice_flag equal to 1 specifies that the value of each slice header syntax element not present is inferred to be equal to the value of corresponding slice header syntax element in the preceding slice containing the coding tree block for which the coding tree block address is SliceCtbAddrRS−1. When not present, the value of dependent_slice_flag is inferred to be equal to 0. The value of dependent_slice_flag shall be equal to 0 when SliceCtbAddrRS equal to 0.
slice_address specifies the address in slice granularity resolution in which the slice starts. The length of the slice_address syntax element is (Ceil(Log 2(PicWidthInCtbs*PicHeightInCtbs))+SliceGranularity) bits.
The variable SliceCtbAddrRS, specifying the coding tree block in which the slice starts in coding tree block raster scan order, is derived as follows.SliceCtbAddrRS=(slice_address>>SliceGranularity)
The variable SliceCbAddrZS, specifying the address of first coding block in the slice in minimum coding block granularity in z-scan order, is derived as follows.SliceCbAddrZS=slice_address<<((log 2_diff max_min_coding_block_size−SliceGranularity)<<1)
The slice decoding starts with the largest coding unit possible at the slice starting coordinate.
first_slice_in_pic_flag indicates whether the slice is the first slice of the picture. If first_slice_in_pic_flag is equal to 1, the variables SliceCbAddrZS and SliceCtbAddrRS are both set to 0 and the decoding starts with the first coding tree block in the picture.
pic_parameter_set_id specifies the picture parameter set in use. The value of pic_parameter_set_id shall be in the range of 0 to 255, inclusive.
num_entry_point_offsets specifies the number of entry_point_offset[i] syntax elements in the slice header. When tiles_or_entropy_coding_sync_idc is equal to 1, the value of num_entry_point_offsets shall be in the range of 0 to (num_tile_columns_minus1+1)*(num_tile_rows_minus1+1)−1, inclusive. When tiles_or_entropy_coding_sync_idc is equal to 2, the value of num_entry_point_offsets shall be in the range of 0 to PicHeightInCtbs−1, inclusive. When not present, the value of num_entry_point_offsets is inferred to be equal to 0.
offset_len_minus1 plus 1 specifies the length, in bits, of the entry_point_offset[i] syntax elements.
entry_point_offset[i] specifies the i-th entry point offset, in bytes and shall be represented by offset_len_minus1 plus 1 bits. The coded slice data after the slice header consists of num_entry_point_offsets+1 subsets, with subset index values ranging from 0 to num_entry_point_offsets, inclusive. Subset 0 consists of bytes 0 to entry_point_offset[0]−1, inclusive, of the coded slice data, subset k, with k in the range of 1 to num_entry_point_offsets −1, inclusive, consists of bytes entry_point_offset[k−1] to entry_point_offset[k]+entry_point_offset[k−1]1, inclusive, of the coded slice data, and the last subset (with subset index equal to num_entry_point_offsets) consists of the remaining bytes of the coded slice data.
When tiles_or_entropy_coding_sync_idc is equal to 1 and num_entry_point_offsets is greater than 0, each subset shall contain all coded bits of exactly one tile, and the number of subsets (i.e., the value of num_entry_point_offsets+1) shall be equal to or less than the number of tiles in the slice.
When tiles_or_entropy_coding_sync_idc is equal to 1, each slice includes either a subset of one tile (in which case signaling of entry points is unnecessary) or an integer number of complete tiles.
When tiles_or_entropy_coding_sync_idc is equal to 2 and num_entry_point_offsets is greater than 0, each subset k with k in the range of 0 to num_entry_point_offsets−1, inclusive, shall contain all coded bits of exactly one row of coding tree blocks, the last subset (with subset index equal to num_entry_point_offsets) shall contain all coded bits of the remaining coding blocks included in the slice, wherein the remaining coding blocks consist of either exactly one row of coding tree blocks or a subset of one row of coding tree blocks, and the number of subsets (i.e., the value of num_entry_point_offsets+1) shall be equal to the number of rows of coding tree blocks in the slice, wherein a subset of one row of coding tree blocks in the slice is also counted.
When tiles_or_entropy_coding_sync_idc is equal to 2, a slice may include a number of rows of coding tree blocks and a subset of a row of coding tree blocks. For example, if a slice include two and a half rows of coding tree blocks, the number of subsets (i.e., the value of num_entry_point_offsets+1) shall be equal to 3.
FIG. 9 shows the picture parameter set RBSP syntax of [2], the picture parameter set RBSP semantics of [2] being defined as:
dependent_slice_enabled_flag equal to 1 specifies the presence of the syntax element dependent_slice_flag in the slice header for coded pictures referring to the picture parameter set. dependent_slice_enabled_flag equal to 0 specifies the absence of the syntax element dependent_slice_flag in the slice header for coded pictures referring to the picture parameter set. When tiles_or_entropy_coding_sync_idc is equal to 3, the value of dependent_slice_enabled_flag shall be equal to 1.
tiles_or_entropy_coding_sync_idc equal to 0 specifies that there shall be only one tile in each picture referring to the picture parameter set, there shall be no specific synchronization process for context variables invoked before decoding the first coding tree block of a row of coding tree blocks in each picture referring to the picture parameter set, and the values of cabac_independent_flag and dependent_slice_flag for coded pictures referring to the picture parameter set shall not be both equal to 1.
When cabac_independent_flag and depedent_slice_flag are both equal to 1 for a slice, the slice is an entropy slice.]
tiles_or_entropy_coding_sync_idc equal to 1 specifies that there may be more than one tile in each picture referring to the picture parameter set, there shall be no specific synchronization process for context variables invoked before decoding the first coding tree block of a row of coding tree blocks in each picture referring to the picture parameter set, and the values of cabac_independent_flag and dependent_slice_flag for coded pictures referring to the picture parameter set shall not be both equal to 1.
tiles_or_entropy_coding_sync_idc equal to 2 specifies that there shall be only one tile in each picture referring to the picture parameter set, a specific synchronization process for context variables shall be invoked before decoding the first coding tree block of a row of coding tree blocks in each picture referring to the picture parameter set and a specific memorization process for context variables shall be invoked after decoding two coding tree blocks of a row of coding tree blocks in each picture referring to the picture parameter set, and the values of cabac_independent_flag and dependent_slice_flag for coded pictures referring to the picture parameter set shall not be both equal to 1.
tiles_or_entropy_coding_sync_idc equal to 3 specifies that there shall be only one tile in each picture referring to the picture parameter set, there shall be no specific synchronization process for context variables invoked before decoding the first coding tree block of a row of coding tree blocks in each picture referring to the picture parameter set, and the values of cabac_independent_flag and dependent_slice_flag for coded pictures referring to the picture parameter set may both be equal to 1.
When dependent_slice_enabled_flag shall be equal to 0, tiles_or_entropy_coding_sync_idc shall not be equal to 3.
It is a requirement of bitstream conformance that the value of tiles_or_entropy_coding_sync_idc shall be the same for all picture parameter sets that are activated within a coded video sequence.
For each slice referring to the picture parameter set, when tiles_or_entropy_coding_sync_idc is equal to 2 and the first coding block in the slice is not the first coding block in the first coding tree block of a row of coding tree blocks, the last coding block in the slice shall belong to the same row of coding tree blocks as the first coding block in the slice.
num_tile_columns_minus1 plus 1 specifies the number of tile columns partitioning the picture.
num_tile_rows_minus1 plus 1 specifies the number of tile rows partitioning the picture.
When num_tile_columns_minus1 is equal to 0, num_tile_rows_minus1 shall not be equal to 0.
uniform_spacing_flag equal to 1 specifies that column boundaries and likewise row boundaries are distributed uniformly across the picture. uniform_spacing_flag equal to 0 specifies that column boundaries and likewise row boundaries are not distributed uniformly across the picture but signaled explicitly using the syntax elements column_width[i] and row_height[i].
column_width[i] specifies the width of the i-th tile column in units of coding tree blocks.
row_height[i] specifies the height of the i-th tile row in units of coding tree blocks.
The vector colWidth[i] specifies the width of the i-th tile column in units of CTBs with the column i ranging from 0 to num_tile_columns_minus1, inclusive.
The vector CtbAddrRStoTS[ctbAddrRS] specifies the conversation from a CTB address in raster scan order to a CTB address in tile scan order with the index ctbAddrRS ranging from 0 to (picHeightInCtbs*picWidthInCtbs)−1, inclusive.
The vector CtbAddrTStoRS[ctbAddrTS] specifies the conversation from a CTB address in tile scan order to a CTB address in raster scan order with the index ctbAddrTS ranging from 0 to (picHeightInCtbs*picWidthInCtbs)−1, inclusive.
The vector TileId[ctbAddrTS] specifies the conversation from a CTB address in tile scan order to a tile id with ctbAddrTS ranging from 0 to (picHeightInCtbs*picWidthInCtbs)−1, inclusive.
The values of colWidth, CtbAddrRStoTS, CtbAddrTStoRS and TileId are derived by invoking the CTB raster and tile scanning conversation process as specified in subclause 6.5.1 with PicHeightInCtbs and PicWidthInCtbs as inputs and the output is assigned to colWidth, CtbAddrRStoTS and TileId.
The values of ColumnWidthInLumaSamples[i], specifying the width of the i-th tile column in units of luma samples, are set equal to colWidth[i]<<Log 2CtbSize.
The array MinCbAddrZS[x][y], specifying the conversation from a location (x, y) in units of minimum CBs to a minimum CB address in z-scan order with x ranging from 0 to picWidthInMinCbs−1, inclusive, and y ranging from 0 to picHeightInMinCbs−1, inclusive, is derived by invoking the Z scanning order array initialization process as specified in subclause 6.5.2 with Log 2MinCbSize, Log 2CtbSize, PicHeightInCtbs, PicWidthInCtbs, and the vector CtbAddrRStoTS as inputs and the output is assigned to MinCbAddrZS.
loop_filter_across_tiles_enabled_flag equal to 1 specifies that in-loop filtering operations are performed across tile boundaries. loop_filter_across_tiles_enabled_flag equal to 0 specifies that in-loop filtering operations are not performed across tile boundaries. The in-loop filtering operations include the deblocking filter, sample adaptive offset, and adaptive loop filter operations. When not present, the value of loop_filter_across_tiles_enabled_flag is inferred to be equal to 1.
cabac_independent_flag equal to 1 specifies that CABAC decoding of coding blocks in a slice is independent from any state of the previously decoded slice. cabac_independent_flag equal to 0 specifies that CABAC decoding of coding blocks in a slice is dependent from the states of the previously decoded slice. When not present, the value of cabac_independent_flag is inferred to be equal to 0.
A derivation process for the availability of a coding block with a minimum coding block address was described as follows:
Inputs to this process are                a minimum coding block address minCbAddrZS in z-scan order        the current minimum coding block address currMinCBAddrZS in z-scan order        
Output of this process is the availability of the coding block with minimum coding block address cbAddrZS in z-scan order cbAvailable.
Note, that the meaning of availability is determined when this process is invoked.
Note, that any coding block, regardless of its size, is associated with a minimum coding block address, which is the address of the coding block with the minimum coding block size in z-scan order.
If one or more of the following conditions are true, cbAvailable is set to FALSE.                minCbAddrZS is less than 0        minCbAddrZS is greater than currMinCBAddrZS        the coding block with minimum coding block address minCbAddrZS belongs to a different slice than the coding block with the current minimum coding block address currMinCBAddrZS and the dependent_slice_flag of the slice containing the coding block with the current minimum coding block address currMinCBAddrZS is equal to 0.        the coding block with minimum coding block address minCbAddrZS is contained in a different tile than the coding block with the current minimum coding block address currMinCBAddrZS.        
Otherwise, cbAvailable is set to TRUE.
The CABAC parsing process for slice data of [2] was as follows:
This process is invoked when parsing syntax elements with descriptor ae(v).
Inputs to this process are a request for a value of a syntax element and values of prior parsed syntax elements.
Output of this process is the value of the syntax element.
When starting the parsing of the slice data of a slice, the initialization process of the CABAC parsing process is invoked.
The minimum coding block address of the coding tree block containing the spatial neighbor block T (FIG. 10A), ctbMinCbAddrT, is derived using the location (x0, y0) of the top-left luma sample of the current coding tree block as follows.x=x0+2<<Log 2CtbSize−1y=y0−1ctbMinCbAddrT=MinCbAddrZS[x>>Log 2MinCbSize][y>>Log 2MinCbSize]
The variable availableFlagT is obtained by invoking the coding block availability derivation process with ctbMinCbAddrT as input.
When starting the parsing of a coding tree, the following ordered steps apply.
The arithmetic decoding engine is initialized as follows.
If CtbAddrRS is equal to slice_address, dependent_slice_flag is equal to 1 and entropy_coding_reset_flag is equal to 0, the following applies.                The synchronization process of the CABAC parsing process is invoked with TableStateIdxDS and TableMPSValDS as input.        The decoding process for binary decisions before termination is invoked, followed by the initialization process for the arithmetic decoding.        Otherwise if tiles_or_entropy_coding_sync_idc is equal to 2, and CtbAddrRS % PicWidthInCtbs is equal to 0, the following applies.        When availableFlagT is equal to 1, the synchronization process of the CABAC parsing process is invoked with TableStateIdxWPP and TableMPSValWPP as input.        The decoding process for binary decisions before termination is invoked, followed by the initialization process for the arithmetic decoding engine.        
When cabac_independent_flag is equal to 0 and dependent_slice_flag is equal to 1, or when tiles_or_entropy_coding_sync_idc is equal to 2, the memorization process is applied as follows.                When tiles_or_entropy_coding_sync_idc is equal to 2 and CtbAddrRS % PicWidthInCtbs is equal to 2, the memorization process of the CABAC parsing process is invoked with TableStateIdxWPP and TableMPSValWPP as output.        When cabac_independent_flag is equal to 0, dependent_slice_flag is equal to 1, and end_of slice_flag is equal to 1, the memorization process of the CABAC parsing process is invoked with TableStateIdxDS and TableMPSValDS as output.        
The parsing of syntax elements proceeds as follows:
For each requested value of a syntax element a binarization is derived.
The binarization for the syntax element and the sequence of parsed bins determines the decoding process flow.
For each bin of the binarization of the syntax element, which is indexed by the variable binIdx, a context index ctxIdx is derived.
For each ctxIdx the arithmetic decoding process is invoked.
The resulting sequence (b0. bbinIdx) of parsed bins is compared to the set of bin strings given by the binarization process after decoding of each bin. When the sequence matches a bin string in the given set, the corresponding value is assigned to the syntax element.
In case the request for a value of a syntax element is processed for the syntax element pcm-flag and the decoded value of pcm_flag is equal to 1, the decoding engine is initialized after the decoding of any pcm_alignment_zero_bit, num_subsequent_pcm, and all pcm_sample_luma and pcm_sample_chroma data.
In the design framework described so far the following problem occurred.
The timing of the decoding units need to be known before coding and sending the data in a low delay scenario, where NAL units will already be sent out by the encoder, while the encoder is still coding parts of the picture, i.e. other sub-picture decoding units. This is, because the NAL unit order in an access unit only allows SEI messages to precede the VCL (Video Coding NAL units) in an access unit, but in such a low delay scenario, the non-VCL NAL units need to be already on the wire, i.e. sent out, if the encoder starts encoding the decoding units. FIG. 10B illustrates the structure of an access unit as defined in [2]. [2] did not yet specify end of sequence or stream, so their presence in the access unit was tentative.
Furthermore, the number of NAL units associated with a sub-picture also needs to be known beforehand in a low delay scenario, as the picture timing SEI message contains this information and has to be compiled and send out before the encoder starts to encode the actual picture. An application designer reluctant to insert filler data NAL units, with potentially no filler data to comply with the NAL unit number, as signaled per decoding unit in the picture timing SEI, needs means to signal this information on a sub-picture level. The same holds for sub-picture timing, which is currently fixed at the being of an access unit by the parameters given in the timing SEI message.
Further shortcomings of the draft specification [2] include numerous signaling of sub-picture level, which is used for specific applications, such as ROI signaling or tile dimensions signaling.
The above outlined problems are not specific to the HEVC standard. Rather, this problem also occurs in connection with other video codecs as well. FIG. 11 shows, more generally, a video transmission scenery where a pair encoder 10 and decoder 12 are connected via a network 14 in order to transmit a video 16 from encoder 10 to decoder 12 at short end-to-end delay. The problem already outlined above is the following. The encoder 10 encodes the sequence of frames 18 of the video 16 in accordance with a certain decoding order which substantially, but not necessarily, follows the reproduction order 20 of frames 18, and within each frame 18 travels through the frame area of frames 18 in some defined manner, such as for example in a raster scan manner with or without tile-sectioning of frames 18. The decoding order controls the availability of information for coding techniques used by encoder 10 such as, for example, prediction and/or entropy coding, i.e. the availability of information relating to spatially and/or temporally neighboring portions of video 16 available to serve as a basis for prediction or context selection. Even though encoder 10 might be able to use parallel processing in order to encode the frames 18 of video 16, encoder 10 needs some time to encode a certain frame 18, such as the current frame. FIG. 11, for example, illustrates a time instant where encoder 10 has already finished encoding portion 18a of a current frame 18, while another portion 18b of current frame 18 has not yet been encoded. As encoder 10 has not yet encoded portion 18b, encoder 10 may not forecast how the available bitrate for encoding current frame 18 should be distributed spatially over current frame 18 to achieve an optimum in terms of, for example, rate/distortion sense. Accordingly, encoder 10 merely has two choices: either encoder 10 estimates a nearly-optimum distribution of the available bitrate for current frame 18 onto the slices into which current frame 18 is spatially subdivided in advance, accordingly accepting that the estimation may be wrong, or encoder 10 finalizes encoding current frame 18 prior to transmitting the packets containing the slices from encoder 10 to decoder 12. In any case, in order to be able to take advantage of any transmission of slice packets of current coded frame 18 prior to the finalization of its encoding, network 14 should be informed of the bitrates associated with each such slice packet in the form of coded picture buffer retrieval times. However, as indicated above, although encoder 10 is, in accordance with the current version of HEVC, able to vary the bitrate distributed over frames 18 by use of defining decoder buffer retrieval times for sub-picture areas individually, encoder 10 needs to transmit or send out such information via network 14 to decoder 12 at the beginning of each access unit collecting all data relating to current frame 18, thereby urging encoder 10 to choose among the just outlined two alternatives, one leading to lower delay but worse rate/distortion, the other leading to optimum rate/distortion, however at increased end-to-end delay.
Thus, so far there is no video codec enabling the achievement of such a low delay that the encoder would be enabled to start transmitting packets relating to portions 18a of the current frame prior to encoding a remaining portion 18b of the current frame, the decoder being able to exploit this intermediate transmission of packets relating to preliminary portions 18a by way of the network 16, which obeys the decoding buffer retrieval timing conveyed within the video data stream sent from encoder 12 to decoder 14. Applications which would, for example, take advantage of such low delay exemplarily encompass industrial applications such as, for example, work piece or fabrication surveillance for automation or inspection purposes or the like. Until now, there is also no satisfactory solution for informing the decoding side on the packets' association to tiles into which a current frame is structured, and interesting regions (region of interest) of a current frame so that intermediate network entities within network 16 are enabled to gather such information from the data stream without having to deeply inspect the inside of the packets, i.e. the slices syntax.