H.264, also referred to as Moving Picture Experts Group-4 (MPEG-4) Advanced Video Coding (AVC), is the state of the art video coding standard. It consists of a block based hybrid video coding scheme that exploits temporal and spatial redundancies.
H.264 uses previously decoded pictures for temporal prediction when decoding encoded pictures. These pictures are called reference pictures and there may be more than one reference picture used for decoding a picture. For each reference picture in H.264, there is a codeword frame_num that acts as a label for the reference picture. The frame_num indicates the decoding order and the frame_num must increase by 1 for each reference picture in decoding order otherwise the bitstream is not compliant to the standard. H.264 also specifies a picture order count (POC) for each picture that the decoder uses to output (display) the pictures in the correct order using either a process called the bumping process or using picture timing information. In short, the bumping process waits with display as long as it is possible. Then the picture with the lowest POC that exist in the decoded picture buffer is output (displayed). To determine what picture to display, it is important that the lowest POC is properly defined. An H.264 bitstream always start with a picture that has POC=0. In contrast to frame_num, POC does not need to be incremented by 1, it can be arbitrarily incremented. The maximum POC value is 2{circumflex over ( )}31−1. It is common to use POC type 0 in H.264. For this POC type, the n least significant bits of the POC are signaled in the bitstream. POC is then calculated as:PicOrderCnt=PicOrderCntMsb+pic_order_cnt_lsbwhere pic_order_cnt_lsb is the least significant bits as signaled in the slice header and PicOrderCntMsb is the most significant bits calculated using the syntax element pic_order_cnt_lsb, PrevPicOrderCntLsb and PrevPicOrderCntMsb, where PrevPicOrderCntLsb and PrevPicOrderCntMsb are the values of the previous reference picture in decoding order.
Note that this is different to frame_num, which wraps around. Wrap around means that the frame_num of different pictures increases up to a specific value at which it is reset to zero. POC does not wrap-around.
High Efficiency Video Coding (HEVC) is a new video coding standard currently being developed in Joint Collaborative Team-Video Coding (JCT-VC). JCT-VC is a collaborative project between MPEG and International Telecommunication Union Telecommunication standardization sector (ITU-T).
In H.264/AVC, and HEVC all encoded data is put in Network Abstraction Layer (NAL) units. The NAL unit consists of the encoded data and a NAL unit header. In HEVC there is a temporal_id syntax element in the NAL unit header with information about the temporal layer of the current picture. It is defined in HEVC that a picture with temporal_id=tIdA cannot reference a picture with temporal_id=tIdB if tIdA is less than tIdB.
Thus, pictures in higher temporal layers, identified by temporal_id, can not be used for prediction in lower temporal layers, but pictures in lower temporal layers can be used for prediction in higher temporal layers as illustrated in FIG. 1. The decoding order is the order in which pictures are decoded and the order in which compressed pictures are fed into the decoder. The system is responsible to feed the decoder with the pictures in the right order. If pictures are not in the right order, decoding may not be possible. The picture order count (POC) is assigned for each picture to be used by the decoder to output (display) the pictures in the correct order. Sometimes, in some or all pictures, depending on the coding structure, pictures in one temporal layer are used for prediction by other pictures in the same temporal layer. There are very few, if any, practical use-cases for having pictures in any other temporal layer than the highest temporal layer that are not at all used for prediction. It can therefore be assumed that all pictures in temporal layers lower than the highest temporal layer will be used for prediction by at least one picture in the same or higher temporal layers.
A sub-stream containing pictures of a range of temporal_id can be created from an HEVC bitstream through removal of all pictures belonging to layers higher than temporal layer T, for any chosen T. For example, if a bitstream has four temporal layers {0, 1, 2, 3}, a bitstream where the temporal layers 2 and 3 has been removed is fully decodable by an HEVC decoder.
Further, a picture in HEVC is partitioned into one or more slices, where each slice is an independently decodable segment of the picture. This means that if a slice is missing, for instance got lost during transmission, the other slices of that picture can still be decoded correctly. In order to make slices independent, they do not depend on each other. No bitstream element of another slice is required for decoding any element of another slice.
Each slice contains a slice header which independently provides all required data for the slice to be independently decodable. One example of a data element present in the slice header is the slice address, which is used for the decoder to know the spatial location of the slice. Another example is the slice quantization delta which is used by the decoder to know what quantization parameter to use for the start of the slice. There are many more data elements in the slice header.
In HEVC, absolute signaling of reference pictures is used instead of signaling reference picture modifications in a relative way as in previous standards, such as H.264. The absolute signaling is realized by a list of reference pictures, referred to as a Reference Picture Set that is signaled for each picture either explicitly or by using a reference to a Sequence Parameter Set (SPS). Picture Order Count (POC) is used in HEVC to define the display order of pictures and also to identify reference pictures.
In the H.264 design POC is most often signaled by the least significant bits. In HEVC POC is always signaled by the least significant bits, except for Instant Decoder Refresh (IDR) pictures for which POC is inferred to be equal to 0.
POC is calculated using values from the previous reference picture in decoding order. When temporal layers are present in an HEVC bitstream, a decoder may choose to decode only a subset of the pictures in the bitstream, i.e. those pictures with a temporal_id lower than a specific value. Thus, which picture that is the previous reference picture for a certain picture P may depend on the number of layers that are decoded by the decoder which may result in different POC values for P for different decoders. This is something that must be avoided in order to have a stable temporally scalable specification.