The High Efficiency Video Coding (HEVC) core standard has recently been finalized by the International Telecommunication Union (ITU) (ITU-T Rec. H.265) and the Moving Picture Experts Group (MPEG) (ISO/IEC 23008-2/MPEG-H Part 2). Layered extensions to the HEVC standard are under development, e.g., the Multi-View extension (MV-HEVC), the 3D extension (3D-HEVC), and the Scalable extension (SHVC). Further extensions, or combinations of existing extensions, may be specified in the future.
HEVC and its extensions make extensive use of predictive coding tools. From the decoder perspective, pixel data is reconstructed using previously decoded pixel data for prediction. In particular, for inter-picture prediction, previously decoded pictures, so-called reference pictures, are used for prediction in the reconstruction process of a current picture.
According to the HEVC specification, each picture is subdivided into one or multiple slices, and each slice can contain multiple blocks (more specifically, block-shaped coding units and prediction units). The pictures which are available as prediction references for decoding a current slice are placed into so-called reference picture lists. According to the HEVC specification, different types of slices exist. For “P slices”, at most one reference picture can be used for prediction of a current block. Accordingly, P slices have one reference picture list, called “list0”. For “B slices”, at most two reference pictures can be used for prediction of a current block, which is also referred to as “bi-prediction”. Accordingly, B slices have two reference picture lists, referred to as “list0” and “list1”.
The reference picture used for reconstructing a particular block can be signaled by means of so-called reference picture indexes. A reference picture index is an index into a reference picture list, such as list0 or list1. The reference picture indexes are coded along with other data in the HEVC bit stream as part of coded slice data. The length of a code word used to send a reference picture index depends on the index value itself, in particular if Variable Length Coding (VLC) is used. Typically, small reference picture indexes require shorter code words. Thus, the further in front of a reference picture list a certain reference picture is placed, the fewer bits are required to indicate its use. Accordingly, in order to achieve high compression efficiency, a typical strategy is to place reference pictures which are frequently used for prediction at the front of a reference picture list.
Typically, the reference picture lists are constructed in a two-step process, (1) initial reference picture list construction followed by (2) reference picture list modification. Step 1 is pre-defined through the decoder specification and results in an initial reference picture list. Step 2 involves signaling reference picture list modification commands in slice headers and results in the final reference picture list by applying the reference picture list modification commands on the initial reference picture list. Since sending reference picture list modification commands requires transmission of additional bits, it is desirable that the initial reference picture list is carefully designed, so that frequently used reference pictures can be indicated with few bits, yielding high compression efficiency.
While the HEVC core specification only uses temporally neighboring pictures for inter-picture prediction, i.e., pictures within the same temporal layer, it is likely that multi-layer HEVC extensions, such as scalable and 3D extensions, will use pictures from other layers, e.g., scalability layers and/or view layers, as reference pictures. The current draft SHVC, MV-H EVC, and 3D-H EVC, specifications are using ad-hoc methods for reference picture list construction. Thus, in order to improve bit efficiency, there is a need for more efficient methods for reference picture list construction for multi-layer HEVC extensions using reference pictures across layers.
In the draft SHVC (JCTVC-L1008) and MV-HEVC specifications (JCT3V-C1004), a layer identifies, i.e., is associated with, a set of pictures corresponding to, e.g., a spatial resolution or quality (for SHVC), to a camera view (for MV-HEVC), or to a depth view (for 3D-HEVC). Each layer has an index i and is identified by a layer identifier layer_id (see syntax element layer_id_in_nuh[i] below). The layer index i is typically an indicator for the decoding order. Thus, for each access unit (i.e., sampling time or moment in time), up to one picture for each layer (view, picture resolution, etc.) is decoded in the order of the layer index i.
Further, a set of scalability identifiers are associated with each layer (see syntax element dimension_id[i][j] below). Examples for scalability identifiers are “ViewId” (identifying a certain camera view), “DepthFlag” (identifying whether a layer carries depth data or not), “DependencyId” (indicating decoding dependencies in case of, e.g., spatial scalability), “QualityId” (indicating a video quality), and others.
In SHVC and MV-H EVC, parameters related to high-level video representations are signaled in extensions of the so-called Video Parameter Set (VPS). The VPS extension syntax, and some relevant semantics, is depicted below. Specifically, layer dependencies are signaled using the syntax element “direct_dependency_flag”, based on which the variable arrays RefLayerId[i][j] and NumDirectRefLayers[i] are derived for each layer i, as is described below.
vps_extension( ) {Descriptor while( !byte_aligned( ) )  vps_extension_byte_alignment_reserved_one_bitu(1) avc_base_layer_flagu(1) splitting_flagu(1) for( i = 0, NumScalabilityTypes = 0; i < 16; i++ ) {  scalability_mask[ i ]u(1)  NumScalabilityTypes += scalability_mask[ i ] } for( j = 0; j <NumScalabilityTypes; j++ )  dimension_id_len_minus1[ j ]u(3) vps_nuh_layer_id_present_flagu(1) for( i = 1; i <= vps_max_layers_minus1; i++ ) {  if( vps_nuh_layer_id_present_flag )   layer_id_in_nuh[ i ]u(6)  for( j = 0; j < NumScalabilityTypes; j++ )   dimension_id[ i ][ j ]u(v) } for(lsIdx = 1;lsIdx <= vps_num_layer_sets_minus1;lsIdx ++) {  vps_profile_present_flag[ lsIdx ]u(1)  if( !vps_profile_present_flag[ lsIdx ] )   profile_layer_set_ref_minus1[ lsIdx ]ue(v)  profile_tier_level( vps_profile_present_flag[ lsIdx ],vps_max_sub_layers_minus1) } num_output_layer_setsue(v) for( i = 0; i < num_output_layer_sets; i++ ) {  output_layer_set_idx[ i ]ue(v)  lsIdx = output_layer_set_idx[ i ]  for( j = 0 ; j <= vps_max_layer_id; j++)   if( layer_id_included_flag[ lsIdx ][ j ] )    output_layer_flag[ lsIdx ][ j ]u(1) } for( i = 1; i <= vps_max_layers_minus1; i++ )  for( j = 0; j < i; j++ )   direct_dependency_flag[ i ][ j ]u(1)}
layer_id_in_nuh[i] specifies the value of the nuh_layer_id syntax element in Video Coding Layer (VCL) Network Abstraction Layer (NAL) units of the i-th layer. For i in a range from 0 to vps_max_layers_minus1, inclusive, when not present, the value of layer_id_in_nuh[i] is inferred to be equal to i. When i is greater than 0, layer_id_in_nuh[i] shall be greater than layer_id_in_nuh[i−1]. For i in a range from 0 to vps_max_layers_minus1, inclusive, the variable LayerIdInVps[layer_id_in_nuh[i]] is set equal to i.
dimension_id[i][j] specifies the identifier of the j-th present scalability dimension type of the i-th layer. When not present, the value of dimension_id[i][j] is inferred to be equal to 0. The number of bits used for the representation of dimension_id[i][j] is dimension_id_len_minus1[j]+1. When splitting flag is equal to 1, it is a requirement of bitstream conformance that dimension_id[i][j] shall be equal to ((layer_id_in_nuh[i] & ((1<<dimBitOffset[j+1])−1))>>dim BitOffset[j]).
The variable ScalabilityId[i][smIdx] specifying the identifier of the smIdx-th scalability dimension type of the i-th layer and the variable ViewId[layer_id_in_nuh[i]] specifying the view identifier of the i-th layer are derived as follows:
for (i = 0; i <= vps_max_layers_minus1; i++) {  for( smIdx= 0, j =0; smIdx< 16; smIdx ++ )    if( ( i ! = 0 ) && scalability_mask[ smIdx ] )      ScalabilityId[ i ][ smIdx ] = dimension_id[ i ][ j++ ]    else      ScalabilityId[ i ][ smIdx ] = 0    ViewId[ layer_id_in_nuh[ i ] ] = ScalabilityId[ i ][ 0 ]}
direct_dependency_flag[i][j] equal to 0 specifies that the layer with index j is not a direct reference layer for the layer with index i. direct_dependency_flag[i][j] equal to 1 specifies that the layer with index j may be a direct reference layer for the layer with index i. When direct_dependency_flag[i][j] is not present for i and j in the range of 0 to vps_max_layers_minus1, it is inferred to be equal to 0.
The variables NumDirectRefLayers[i] and RefLayerId[i][j] are derived as follows:
for( i = 1; i <= vps_max_layers_minus1; i++ )  for( j = 0, NumDirectRefLayers[ i ] = 0; j < i; j++ )    if( direct_dependency_flag[ i ][ j ] = = 1 )      RefLayerId[ i ][ NumDirectRefLayers[ i ]++ ] =        layer_id_in_nuh[ j ]
Based on RefLayerId[i][j] and NumDirectRefLayers[i], a so-called inter-layer reference picture set is constructed, as is described below.
The output of the decoding process for an inter-layer reference picture set is an updated list of inter-layer pictures RefPicSetInterLayer.
The list RefPicSetInterLayer is first emptied and then derived as follows:
for( i =0; i < NumDirectRefLayers[ LayerIdInVps[ nuh_layer_id ] ]; i++ ) {  RefPicSetInterLayer[ i ] = the picture with picture order count    equal to PicOrderCnt and nuh_layer_id equal to    RefLayerId[ LayerIdInVps[ nuh_layer_id ][ i ] ]  RefPicSetInterLayer[ i ] is marked as “used for long-term reference”}
The output of the marking process for ending the decoding of a coded picture with nuh_layer_id greater than zero is a potentially updated marking as “used for short-term reference” for some decoded pictures.
The following applies:
for( i =0; i < NumDirectRefLayers[ LayerIdInVps[ nuh_layer_id ] ]; i++ )  RefPicSetInterLayer[ i ] is marked as “used for short-term reference”
Temporal reference pictures and inter-layer reference pictures are combined into two temporary reference picture lists, RefPicListTemp0 and RefPicListTemp1, as is described below. Finally, potential reference picture list modification commands are applied and the final reference picture lists RefPicList0 and RefPicList1 are obtained, as is described below.
The decoding process for reference picture lists construction is invoked at the beginning of the decoding process for each P or B slice.
Reference pictures are addressed through reference indices as specified in sub-clause 8.5.3.3.2 of the HEVC base spec (JCTVC-L1003). A reference index is an index into a reference picture list. When decoding a P slice, there is a single reference picture list RefPicList0. When decoding a B slice, there is a second independent reference picture list RefPicList1 in addition to RefPicList0.
At the beginning of the decoding process for each slice, the reference picture lists RefPicList0 and, for B slices, RefPicList1 are derived as follows.
The variable NumRpsCurrTempList0 is set equal to Max(num_ref_idx_I0_active_minus1+1, NumPocTotalCurr) and the list RefPicListTemp0 is constructed as follows:
rIdx = 0while( rIdx < NumRpsCurrTempList0 ) {  for(i = 0; i < NumPocStCurrBefore && rIdx  < NumRpsCurrTempList0;    rIdx++, i++ )    RefPicListTemp0[ rIdx ] = RefPicSetStCurrBefore[ i ]  for( i = 0; i < NumPocStCurrAfter && rIdx  < NumRpsCurrTempList0;    rIdx++, i++ )    RefPicListTemp0[ rIdx ] = RefPicSetStCurrAfter[ i ]  for( i = 0; i < NumPocLtCurr && rIdx < NumRpsCurrTempList0;    rIdx++, i++ )    RefPicListTemp0[ rIdx ] = RefPicSetLtCurr[ i ]  for( i =  0; i < NumDirectRefLayers[ LayerIdInVps[ nuh_layer_id ] ];    rIdx++, i++)    RefPicListTemp0[ rIdx ] = RefPicSetInterLayer[ i ]}
The list RefPicList0 is constructed as follows:
for( rIdx = 0; rIdx <= num_ref_idx_l0_active_minus1; rIdx++)  RefPicList0[ rIdx ] = ref_pic_list_modification_flag_l0 ?    RefPicListTemp0[ list_entry_l0[ rIdx ] ] :    RefPicListTemp0[ rIdx ]
When the slice is a B slice, the variable NumRpsCurrTempList1 is set equal to Max(num_ref_idx_I1_active_minus1+1, NumPocTotalCurr) and the list RefPicListTemp1 is constructed as follows:
rIdx = 0while( rIdx < NumRpsCurrTempList1 ) {for( i = 0; i < NumPocStCurrAfter && rIdx <NumRpsCurrTempList1;rIdx++, i++ )RefPicListTemp1[ rIdx ] = RefPicSetStCurrAfter[ i ]for( i = 0; i < NumPocStCurrBefore && rIdx <NumRpsCurrTempList1;rIdx++, i++ )RefPicListTemp1[ rIdx ] = RefPicSetStCurrBefore[ i ]for( i = 0; i < NumPocLtCurr && rIdx < NumRpsCurrTempList1;rIdx++, i++ )RefPicListTemp1[ rIdx ] = RefPicSetLtCurr[ i ]for( i = 0; i< NumDirectRefLayers[ LayerIdInVps[ nuh_layer_id ]] ;rIdx++, i++)RefPicListTemp1[ rIdx ] = RefPicSetInterLayer[ i ]}
When the slice is a B slice, the list RefPicList1 is constructed as follows:
for( rIdx = 0; rIdx <= num_ref_idx_l1_active_minus1; rIdx++)RefPicList1[ rIdx ] = ref_pic_list_modification_flag_l1 ?RefPicListTemp1[ list_entry_l1[ rIdx ] ] : RefPicListTemp1[rIdx ]
In the reference picture list initialization procedure summarized hereinbefore, inter-layer reference pictures are appended to the reference picture list according to the order of reference layers in the ordered array RefPicSetInterLayer[ ]. The order of reference layers in RefPicSetInterLayer[ ] is fixed according to the layer index i of the reference layers, from small to large values of i. Thus, the inter-layer reference pictures in both initial reference picture lists are always inserted with increasing order of the layer index i. This order does not take into account potential similarities or dissimilarities of different layers, and is thus not optimal in terms of compression efficiency, or bitrate efficiency.