High Efficiency Video Coding (HEVC) is a new coding standard that has been developed in recent years. In the High Efficiency Video Coding (HEVC) system, the fixed-size macroblock of H.264/AVC is replaced by a flexible block, named coding unit (CU). Pixels in the CU share the same coding parameters to improve coding efficiency. A CU may begin with a largest CU (LCU), which is also referred as coded tree unit (CTU) in HEVC. In addition to the concept of coding unit, the concept of prediction unit (PU) is also introduced in HEVC. Once the splitting of CU hierarchical tree is done, each leaf CU is further split into one or more prediction units (PUs) according to prediction type and PU partition.
Along with the High Efficiency Video Coding (HEVC) standard development, the development of extensions of HEVC has also started. The HEVC extensions include range extensions (RExt) which target at non-4:2:0 colour formats, such as 4:2:2 and 4:4:4, and higher bit-depths video such as 12, 14 and 16 bits per sample. One of the likely applications is screen content coding (SCC) and various coding tools have been developed and demonstrate significant gains in coding efficiency. Among them, the palette coding (a.k.a. major colour based coding) techniques represent block of pixels using indices to the palette (major colours), and encode the palette and the indices by exploiting spatial redundancy.
Decoded Picture Buffer Consideration in Current SCC Draft Standard (SCM)
In HEVC, all the reference pictures are stored in a buffer referred as decoded picture buffer (DPB). This is performed each time when decoding a picture and the current decoded picture after the loop filtering operation is put into DPB (referred as filtered version of current decoded picture). For IntraBC mode (Intra-block copy mode), its reference picture is the current decoded picture prior to the loop filter (referred as unfiltered version of current decoded picture), which is an extra picture compared to HEVC version 1. In JCTVC-U0181 (X. Xu, et al., On storage of filtered and unfiltered current decoded pictures, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 21st Meeting: Warsaw, PL, 19-26 Jun. 2015, Document: JCTVC-U0181), both the filtered and unfiltered versions of current decoded picture are placed into DPB for picture buffer management. The unfiltered version of current picture will be discarded after the completion of decoding current picture and the storage buffer for this picture will be released.
In JCTVC-U0181, the case that the two versions of current decoded picture are identical is considered. This happens when no loop filters (deblocking or SAO) are used for the current picture. In the current HEVC SCC working draft, a variable referred as TwoVersionsOfCurrDecPicFlag is used to identify whether the current picture is used as a reference picture and there are two different versions of current picture due to the use of loop filters in the picture.
In SPS, a syntax element is used to specify the maximum required DPB size for the current CVS (coded video stream). Another syntax element is used to specify the maximally allowed number of pictures that are coded ahead one picture but output after such a picture.
Syntax element sps_max_dec_pic_buffering_minus1[i] plus 1 specifies the maximum required size of the decoded picture buffer for the CVS in units of picture storage buffers when HighestTid is equal to i. The variable HighestTid identifies the highest temporal sub-layer to be decoded. The value of sps_max_dec_pic_buffering_minus1[i] shall be in the range of 0 to MaxDpbSize−1, inclusive, where MaxDpbSize is as specified in clause A.4 of JCTVC-U1005 (R. Joshi, et al, “HEVC Screen Content Coding Draft Text 4”, JCTVC-U1005, Warsaw, PL, June 2015). When i is greater than 0, sps_max_dec_pic_buffering_minus1[i] shall be greater than or equal to sps_max_dec_pic_buffering_minus1[i−1]. The value of sps_max_dec_pic_buffering_minus1[i] shall be less than or equal to vps_max_dec_pic_buffering_minus1[i] for each value of i. When sps_max_dec_pic_buffering_minus1[i] is not present for i in the range of 0 to sps_max_sub_layers_minus1−1, inclusive, due to sps_sub_layer_ordering_info_present_flag being equal to 0, it is inferred to be equal to sps_max_dec_pic_buffering_minus1[sps_max_sub_layers_minus1].
Syntax element sps_max_num_reorder_pics[i] indicates the maximum allowed number of pictures with PicOutputFlag equal to 1 that can precede any picture with PicOutputFlag equal to 1 in the CVS in decoding order and follow that picture with PicOutputFlag equal to 1 in output order when HighestTid is equal to i. The value of sps_max_num_reorder_pics[i] shall be in the range of 0 to sps_max_dec_pic_buffering_minus1[i], inclusive. When i is greater than 0, sps_max_num_reorder_pics[i] shall be greater than or equal to sps_max_num_reorder_pics[i−1]. The value of sps_max_num_reorder_pics[i] shall be less than or equal to vps_max_num_reorder_pics[i] for each value of i. When sps_max_num_reorder_pics[i] is not present for i in the range of 0 to sps_max_sub_layers_minus1−1, inclusive, due to sps_sub_layer_ordering_info_present_flag being equal to 0, it is inferred to be equal to sps_max_num_reorder_pics[sps_max_sub_layers_minus1].
In the above example based on HEVC, sps_max_dec_pic_buffering_minus1[i] and sps_max_num_reorder_pics[i] are specified for each sub-layer i in a system arranged in multiple sub-layers. Nevertheless, the system may also have only one layer. The following sections from JCTVC-U1005 specify how pictures stored in DPB are removed from DPB.
C 5.2.2 Output and Removal of Pictures From the DPB
The output and removal of pictures from the DPB before the decoding of the current picture (but after parsing the slice header of the first slice of the current picture) happens instantaneously when the first decoding unit of the access unit containing the current picture is removed from the CPB (Coded Picture Buffer) and proceeds as follows:                The decoding process for RPS (Reference Picture Set) as specified in clause 8.3.2 is invoked.        If the current picture is an TRAP (Intra Random Access Point) picture with NoRaslOutputFlag equal to 1 that is not picture 0, the following ordered steps are applied:                    1. The variable NoOutputOfPriorPicsFlag is derived for the decoder under test as follows:                            If the current picture is a CRA (Clean Random Access) picture, NoOutputOfPriorPicsFlag is set equal to 1 (regardless of the value of no_output_of_prior_pics_flag).                Otherwise, if the value of pic_width_in_luma_samples, pic_height_in_luma_samples, chroma_format_idc, separate_colour_plane_flag, bit_depth_luma_minus8, bit_depth_chroma_minus8, or sps_max_dec_pic_buffering_minus1[HighestTid] derived from the active SPS is different from the value of pic_width_in_luma_samples, pic_height_in_luma_samples, chroma_format_idc, separate_colour_plane_flag, bit_depth_luma_minus8, bit_depth_chroma_minus8, or sps_max_dec_pic_buffering_minus1[HighestTid], respectively, derived from the SPS active for the preceding picture, NoOutputOfPriorPicsFlag may (but should not) be set to 1 by the decoder under test, regardless of the value of no_output_of_prior_pics_flag.                                    NOTE—Although setting NoOutputOfPriorPicsFlag equal to no_output_of_prior_pics_flag is preferred under these conditions, the decoder under test is allowed to set NoOutputOfPriorPicsFlag to 1 in this case.                                                Otherwise, NoOutputOfPriorPicsFlag is set equal to no_output_of_prior_pics_flag.                                    2. The value of NoOutputOfPriorPicsFlag derived for the decoder under test is applied for the HRD as follows:                            If NoOutputOfPriorPicsFlag is equal to 1, all picture storage buffers in the DPB are emptied without output of the pictures they contain, and the DPB fullness is set equal to 0.                Otherwise (NoOutputOfPriorPicsFlag is equal to 0), all picture storage buffers containing a picture that is marked as “not needed for output” and “unused for reference” are emptied (without output), and all non-empty picture storage buffers in the DPB are emptied by repeatedly invoking the “bumping” process specified in clause C.5.2.4, and the DPB fullness is set equal to 0.                                                Otherwise (the current picture is not an IRAP picture with NoRaslOutputFlag equal to 1), all picture storage buffers containing a picture which are marked as “not needed for output” and “unused for reference” are emptied (without output). For each picture storage buffer that is emptied, the DPB fullness is decremented by one. When one or more of the following conditions are true, the “bumping” process specified in clause C.5.2.4 is invoked repeatedly while further decrementing the DPB fullness by one for each additional picture storage buffer that is emptied, until none of the following conditions are true:                    The number of pictures in the DPB that are marked as “needed for output” is greater than sps_max_num_reorder_pics[HighestTid].            sps_max_latency_increase_plus1[HighestTid] is not equal to 0 and there is at least one picture in the DPB that is marked as “needed for output” for which the associated variable PicLatencyCount is greater than or equal to SpsMaxLatencyPictures[HighestTid].            The number of pictures in the DPB is greater than or equal to sps_max_dec_pic_buffering_minus1[HighestTid]+1.                        
C 5.2.3 Additional Bumping
The processes specified in this clause happen instantaneously when the last decoding unit of access unit n containing the current picture is removed from the CPB.
When the current picture has PicOutputFlag equal to 1, for each picture in the DPB that is marked as “needed for output” and follows the current picture in output order, the associated variable PicLatencyCount is set equal to PicLatencyCount+1.
The following applies:                If the current decoded picture has PicOutputFlag equal to 1, it is marked as “needed for output” and its associated variable PicLatencyCount is set equal to 0.        Otherwise (the current decoded picture has PicOutputFlag equal to 0), it is marked as “not needed for output”.        
When one or more of the following conditions are true, the “bumping” process specified in clause C.5.2.4 is invoked repeatedly until none of the following conditions are true:                The number of pictures in the DPB that are marked as “needed for output” is greater than sps_max_num_reorder_pics[HighestTid].        sps_max_latency_increase_plus1[HighestTid] is not equal to 0 and there is at least one picture in the DPB that is marked as “needed for output” for which the associated variable PicLatencyCount that is greater than or equal to SpsMaxLatencyPictures[HighestTid].        
C 5.2.4 “Bumping” Process
The “bumping” process consists of the following ordered steps:                1. The picture that is first for output is selected as the one having the smallest value of PicOrderCntVal of all pictures in the DPB marked as “needed for output”.        2. The picture is cropped, using the conformance cropping window specified in the active SPS for the picture, the cropped picture is output, and the picture is marked as “not needed for output”.        3. When the picture storage buffer that included the picture that was cropped and output contains a picture marked as “unused for reference”, the picture storage buffer is emptied.        NOTE—For any two pictures picA and picB that belong to the same CVS and are output by the “bumping process”, when picA is output earlier than picB, the value of PicOrderCntVal of picA is less than the value of PicOrderCntVal of picB.        
8.1.3 Decoding Process for a Coded Picture With nuh_layer_id Equal to 0
The decoding processes specified in this clause apply to each coded picture with nuh_layer_id equal to 0, referred to as the current picture and denoted by the variable CurrPic, in BitstreamToDecode.
Depending on the value of chroma_format_idc, the number of sample arrays of the current picture is as follows:                If chroma_format_idc is equal to 0, the current picture consists of 1 sample array SL.        Otherwise (chroma_format_idc is not equal to 0), the current picture consists of 3 sample arrays SL, SCb, SCr.        
The decoding process for the current picture takes as inputs the syntax elements and upper-case variables from clause 7. When interpreting the semantics of each syntax element in each NAL unit, the term “the bitstream” (or part thereof, e.g., a CVS of the bitstream) refers to BitstreamToDecode (or part thereof).
When the current picture is a BLA picture that has nal_unit_type equal to BLA_W_LP or is a CRA picture, the following applies:                If some external means not specified in this Specification is available to set the variable UseAltCpbParamsFlag to a value, UseAltCpbParamsFlag is set equal to the value provided by the external means.        Otherwise, the value of UseAltCpbParamsFlag is set equal to 0.        
When the current picture is an IRAP picture, the following applies:                If the current picture is an IDR picture, a BLA picture, the first picture in the bitstream in decoding order, or the first picture that follows an end of sequence NAL unit in decoding order, the variable NoRaslOutputFlag is set equal to 1.        Otherwise, if some external means not specified in this Specification is available to set the variable HandleCraAsBlaFlag to a value for the current picture, the variable HandleCraAsBlaFlag is set equal to the value provided by the external means and the variable NoRaslOutputFlag is set equal to HandleCraAsBlaFlag.        Otherwise, the variable HandleCraAsBlaFlag is set equal to 0 and the variable NoRaslOutputFlag is set equal to 0.        
Depending on the value of separate_colour_plane_flag, the decoding process is structured as follows:                If separate_colour_plane_flag is equal to 0, the decoding process is invoked a single time with the current picture being the output.        Otherwise (separate_colour_plane_flag is equal to 1), the decoding process is invoked three times. Inputs to the decoding process are all NAL units of the coded picture with identical value of colour_plane_id. The decoding process of NAL units with a particular value of colour_plane_id is specified as if only a CVS with monochrome colour format with that particular value of colour_plane_id would be present in the bitstream. The output of each of the three decoding processes is assigned to one of the 3 sample arrays of the current picture, with the NAL units with colour_plane_id equal to 0, 1, and 2 being assigned to SL, SCb, and SCr, respectively.                    NOTE 1—The variable ChromaArrayType is derived as equal to 0 when separate_colour_plane_flag is equal to 1 and chroma_format_idc is equal to 3. In the decoding process, the value of this variable is evaluated resulting in operations identical to that of monochrome pictures (when chroma_format_idc is equal to 0).                        
The decoding process operates as follows for the current picture CurrPic:    1. The decoding of NAL units is specified in clause 8.2.    2. The processes in clause 8.3 specify the following decoding processes using syntax elements in the slice segment layer and above:            Variables and functions relating to picture order count are derived as specified in clause 8.3.1. This needs to be invoked only for the first slice segment of a picture.        The decoding process for RPS in clause 8.3.2 is invoked, wherein reference pictures may be marked as “unused for reference” or “used for long-term reference”. This needs to be invoked only for the first slice segment of a picture.        A picture storage buffer in the DPB is allocated for storage of the decoded sample values of the current picture after the invocation of the in-loop filter process as specified in clause 8.7. When TwoVersionsOfCurrDecPicFlag is equal to 0 and pps_curr_pic_as_ref_enabled_flag is equal to 1, this picture storage buffer is marked as “used for long-term reference”. When TwoVersionsOfCurrDecPicFlag is equal to 1, another picture storage buffer in the DPB is allocated for storage of the decoded sample values of the current picture immediately before the invocation of the in-loop filter process as specified in clause 8.7, and is marked as “used for long-term reference”. This needs to be invoked only for the first slice segment of a picture.        When the current picture is a BLA picture or is a CRA picture with NoRaslOutputFlag equal to 1, the decoding process for generating unavailable reference pictures specified in clause 8.3.3 is invoked, which needs to be invoked only for the first slice segment of a picture.            3. The processes in clauses 8.4, 8.5, 8.6 and 8.7 specify decoding processes using syntax elements in all syntax structure layers. It is a requirement of bitstream conformance that the coded slices of the picture shall contain slice segment data for every coding tree unit of the picture, such that the division of the picture into slices, the division of the slices into slice segments, and the division of the slice segments into coding tree units each forms a partitioning of the picture.    4. After all slices of the current picture have been decoded, the current decoded picture after the invocation of the in-loop filter process as specified in clause 8.7, is marked as “used for short-term reference”. When TwoVersionsOfCurrDecPicFlag is equal to 1, the current decoded picture before the invocation of the in-loop filter process as specified in clause 8.7, is marked as “unused for reference”.
8.3.4 Decoding Process for Reference Picture Lists Construction
This process is invoked at the beginning of the decoding process for each P or B slice. The variables CurrPicInList0Flag and CurrPicInList1Flag are both set equal to 0.
Reference pictures are addressed through reference indices as specified in clause 8.5.3.3.2. A reference index is an index into a reference picture list. When decoding a P slice, there is a single reference picture list RefPicList0. When decoding a B slice, there is a second independent reference picture list RefPicList1 in addition to RefPicList0.
At the beginning of the decoding process for each slice, the reference picture lists RefPicList0 and, for B slices, RefPicList1 are derived as follows:
When TwoVersionsOfCurrDecPicFlag is equal to 1, let the variable currPic point to the current decoded picture before the invocation of the in-loop filter process as specified in clause 8.7; otherwise (TwoVersionsOfCurrDecPicFlag is equal to 0), let the variable currPic point to the current decoded picture after the invocation of the in-loop filter process as specified in clause 8.7.
In JCTVC-U1005, the maximum DPB size is described as follows. When the specified level is not level 8.5 defined in JCTVC-U1005, the value of sps_max_dec_pic_buffering_minus1[HighestTid]+1 shall be less than or equal to MaxDpbSize, which is derived as follows:                if(PicSizeInSamplesY<=(MaxLumaPs>>2))                    MaxDpbSize=Min(4*maxDpbPicBuf, 16)                        else if(PicSizeInSamplesY<=(MaxLumaPs>>1))                    MaxDpbSize=Min(2*maxDpbPicBuf, 16)                        else if(PicSizeInSamplesY<=((3*MaxLumaPs)>>2))                    MaxDpbSize=Min((4*maxDpbPicBuf)/3, 16)                        else                    MaxDpbSize=maxDpbPicBuf.                        
In the above derivation, MaxLumaPs is specified in Table A.6 of JCTVC-U1005, and maxDpbPicBuf is equal to 6 for all profiles. The value of sps_curr_pic_ref_enabled_flag is required to be equal to 0. The value of sps_curr_pic_ref_enabled_flag is 7 for all profiles where the value of sps_curr_pic_ref_enabled_flag is not required to be equal to 0.
Basically, the value MaxDpb Size is related to the resolution of video (indicated by MaxLumaPs). When the resolution is low, more pictures are allowed in DPB.
In JCTVC-U1005, the process of storing the current picture into the DPB and removing the current picture from the DPB are disclosed in Clauses C.3.4 and C.3.5 as follows.
C.3.4 Current Decoded Picture Marking and Storage
The current decoded picture after the invocation of the in-loop filter process as specified in clause 8.7 is stored in the DPB in an empty picture storage buffer, the DPB fullness is incremented by one. When TwoVersionsOfCurrDecPicFlag is equal to 0 and pps_curr_pic_ref_enabled_flag is equal to 1, this picture is marked as “used for long-term reference”. After all the slices of the current picture have been decoded, this picture is marked as “used for short-term reference”.
When TwoVersionsOfCurrDecPicFlag is equal to 1, the current decoded picture before the invocation of the in-loop filter process as specified in clause 8.7 is stored in the DPB in an empty picture storage buffer, the DPB fullness is incremented by one, and this picture is marked as “used for long-term reference”.
Note that unless more memory than required by the level limit is available for storage of decoded pictures, decoders should start storing decoded parts of the current picture into the DPB when the first slice segment is decoded and continue storing more decoded samples as the decoding process proceeds.
C.3.5 Removal of Pictures From the DPB After Decoding of the Current Picture
When TwoVersionsOfCurrDecPicFlag is equal to 1, immediately after decoding of the current picture, at the CPB removal time of the last decoding unit of access unit n (containing the current picture), the current decoded picture before the invocation of the in-loop filter process as specified in clause 8.7 is removed from the DPB, and the DPB fullness is decremented by one.
With the changes in decoded picture buffer management when current picture is a reference picture, some constraints need to be imposed to ensure that the function of decoded picture buffer management works properly.
FIG. 1A illustrates an example of decoded picture buffer management process for a coding system using temporal hierarchy 110. The maximum number of decoded picture buffer size is set to 5 and the maximum number of reordered pictures is 4. The DPB status 120 at different time instances are shown, where each filled box indicates an occupied decoded picture and each white box indicates an available picture buffer to contain current and future decoded picture. The pictures outputed and removed from the DPB are listed for different time instances.
FIG. 1B illustrates an example of decoded picture buffer management process, wherein the maximum number of decoded picture buffer size is set to 5 and the maximum reordered number of pictures is 3.
In this invention, several methods that are related to the Palette predictor initialization at PPS or SPS, and the DPB management when current picture is a reference picture are disclosed.