In the ongoing standardization tracks based on H.265/High Efficiency Video Coding (HEVC) while providing backward compatibility, i.e. scalable video coding and a Three-Dimensional Video (3DV) coding including MV-HEVC (HEVC multi-view video coding extension framework) and 3D-HEVC (3D High Efficiency Video Coding), a unified common high-level structural design is adopted. The unified structural design is based on a multi-layer video coding, which introduces a concept of “layer” to represent texture components and depth components of the MV-HEVC and the 3D-HEVC and different scalable layers of the scalable video coding, and to indicate different views and scalable layers by means of layer identifiers (Layer Ids). A currently issued H.265/HEVC standard is referred to as H.265/HEVC Version 1 standard.
In multi-layer video coding, video pictures obtained at the same time instant and corresponding coding bits constitute an Access Unit (AU). In the same AU, each layer of pictures may use different coding methods. In such a way, in the same AU, a certain layer of pictures may be an Intra Random Access Point (IRAP) picture servable as a random point, and one or more of pictures on other layers are common inter-frame and inter-layer predicted coding pictures. In practical application, different layers may select respective IRAP picture insertion policies according to a network transmission situation, a video content changing and the like. For example, a shorter IRAP picture insertion period may be adopted for video pictures on a Base Layer (BL) compatible with H.265/HEVC, and a relatively longer IRAP picture insertion period may be adopted for a video picture on an Enhancement Layer (EL). In such a way, by means of a layer-wise accessed multi-layer video coding structure, random access performance of a multi-layer video coding bitstream may be ensured without sharp increment in coding bit-rate.
A BL bitstream in the multi-layer video coding bitstream should be compatible with the H.265/HEVC Version 1 standard. That is, the multi-layer video coding bitstream should ensure that a decoder designed according to the H.265/HEVC Version 1 standard can correctly decode the BL bitstream extracted from the multi-layer video coding bitstream. Specifically, for the MV-HEVC and the 3D-HEVC, the BL corresponds to a base view or an independent view, and the EL corresponds to an enhancement view or a dependent view. In practical application, a base view bitstream only used for being played by a traditional two-dimensional television, a dual-view bitstream supporting three-dimensional display and a multi-view bitstream for three-dimensional display may be obtained by means of a method for extracting the multi-layer video coding bitstream.
In the H.265/HEVC Version 1 standard, there are three types of IRAP pictures, namely an Instantaneous Decoding Refresh (IDR) picture, a Broken Link Access (BLA) picture and a Clean Random Access (CRA) picture. The three pictures are coded in an intra coding mode and decoded without depending on other pictures. The three picture types are different in operations on Picture Order Count (POC) and Decoded Picture Buffer (DPB).
The POC is an order count configured to identify a picture displaying order in H.265/HEVC Version 1. According to the H.265/HEVC Version 1 standard, a POC value of a picture is composed of two parts. If the POC value of the picture is represented by PicOrderCntVal, PicOrderCntVal=PicOrderCntMsb+PicOrderCntLsb, where PicOrderCntMsb represents a Most Significant Bit (MSB) value of the POC value of the picture, and PicOrderCntLsb represents a Least Significant Bit (LSB) value of the POC value of the picture. Generally, the PicOrderCntMsb value is equal to a PicOrderCntMsb value of a previous picture (TemporalId=0) with respect to a current picture according to a decoding order, and the PicOrderCntLsb value is equal to a value of a slice_pic_order_cnt_lsb field in slice header information. The number of bits for representing slice_pic_order_cnt_lsb field in the bitstream is signalled by log 2_max_pic_order_cnt_lsb_minus4 in a Sequence Parameter Set (SPS), and the number of bits for representing slice_pic_order_cnt_lsb in the bitstream is determined as log 2_max_pic_order_cnt_lsb_minus4+4.
In the H.265/HEVC Version 1, if a current picture is an IDR picture, a PicOrderCntMsb value is set as 0, slice header information does not contain a slice_pic_order_cnt_lsb field, and a PicOrderCntLsb value defaults to 0. If a current picture is a BLA picture, a PicOrderCntMsb value is set as 0, slice header information contains a slice_pic_order_cnt_lsb field configured to determine a PicOrderCntLsb value. If a current picture is a CRA picture and a flag bit HandleCraAsBlaFlag value is equal to 0, the POC is calculated by means of a general method; and if a current picture is a CRA picture and a flag bit HandleCraAsBlaFlag value is equal to 1, the POC value of the CRA picture is calculated by means of a BLA picture method.
It is important to note that in multi-layer video coding standard, the slice header information of the EL always contains a slice_pic_order_cnt_lsb field regardless of corresponding picture type.
On this basis, for the multi-layer video coding bitstream, in order to ensure that pictures at the same time can be detected in DPB detection process and to make it convenient for a decoder to determine a start/end position of each AU in the bitstream using the POC value, it is required that all pictures in the AUs have the same POC value. For a layer-wise coding structure, each AU probably contains IRAP pictures and non-IRAP pictures at the same time. In such a way, if the IRAP pictures are an IDR picture and a BLA picture, the POC values of the pictures contained in this AU will be different. As a result, it is necessary to design a POC alignment function for the multi-layer video coding standard so as to meet that all pictures in the AUs may have the same POC when using the layer-wise structure.
In order to solve the problem, a POC alignment method is proposed in a JCT-VC standard conference proposal JCTVC-N0244. The method refers to adding a poc_reset_flag field of which the length is 1 bit by using a reserved bit in slice header information. When a value of the field is equal to 1, a POC value of a picture is calculated in accordance with a general method, then a so-called POC shifting operation is performed as subtracting the calculated POC value from the POC values of the pictures in the same layer (including BL) in DPB, and finally the POC value of the picture is set to be 0.
The main defect of the method is that a BL bitstream cannot be compatible with the H.265/HEVC Version 1 standard, that is, it cannot be ensured that the decoder conforming to the H.265/HEVC Version 1 standard can decode the BL bitstream extracted from the multi-layer video coding bitstream.
In order to solve the problem of computability, it was proposed in JCT-VC conference proposals JCTVC-00140 and JCTVC-00213 that only an MSB in the POC is set as 0 when it is needed to perform POC alignment on the basis of the JCTVC-N0244. Furthermore, an option of delaying POC alignment is added in the JCTVC-00213 to deal with application scenario of losing a slice with a flag bit indicating a reset of POC value and application scenario of different frame rates among layers. It was proposed in JCTVC-00176 that POC alignment is directly performed in the case of an IDR picture without adoption of an explicit signalling by a flag bit in slice header, while adding a reserved bit into slice header of an IDR picture in BL bitstream so as to be configured to calculate a POC value if the picture is a CRA picture rather than an IDR picture. The calculated POC value is configured for performing a POC shifting operation on pictures stored in DPB of ELs. It was proposed in JCTVC-00275 with a concept of a layer POC, which maintains two sets of different POCs for the pictures on the EL. The layer POC is a POC value obtained under the condition that POC alignment is not performed, and the value of layer POC is configured for relevant operations of a decoding algorithm for Reference Picture Set (RPS) and the like. The other set of POC is the POC subjected to POC alignment. The aligned POC value is consistent with that of a picture on the BL in the same AU, and the aligned POC value is configured to control picture output and displaying processes. According to a method proposed by the proposal JCTVC-00275, information of the BL is adopted in a POC alignment process, a variable flag maintained inside an encoder/decoder is configured to trigger the POC alignment process, and a value of the flag is associated with picture type information of the BL.
Under most conditions, it is needed to perform POC alignment on a multi-layer video, and pictures on layers within the same AU are of the same POC value, thereby facilitating picture output control, AU boundary detection and other operations. In spite of this, it is unnecessary to perform POC alignment on some applications. For example, in uncoordinated simulcast, as a certain period of time is only applicable to a video bitstream on a BL or a certain individual EL, under this circumstance, it is unnecessary to perform POC alignment in this bitstream; and if a simulcast bitstream is extracted, edited and recombined, it is unnecessary to perform POC alignment in a bitstream generation process. In addition, a BL and one or more ELs for hybrid scalable video coding are coded by using different video coding standards. Since different coding standards adopt different POC mechanisms and POC-based picture output control operation modes, under the hybrid scalable video coding, it may also be unnecessary to execute a POC alignment operation. In addition, pictures acquired at the same time may be aligned at displaying time by means of timing information added by the processes of system layer or media file packaging for the multi-layer video coding bitstream, and in this case, it is unnecessary to perform POC alignment on a video bitstream.
It can be seen that the method proposed in JCTVC-NO244 implicitly derives the POC alignment operations from BL information or prediction structure information rather than explicitly signalling by flag bit. As a result, in the case that a predictive structure meets certain conditions so that the POC alignment operations are to be executed while POC alignment is not actually needed for that instance, no option is available to disable the unnecessary POC alignment operations partially and/or entirely.