There are numerous video coding standards including ITU-T H.261, ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-T H.263, ISO/IEC MPEG-4 Visual and ITU-T H.264 or ISO/IEC MPEG-4 AVC. H.264/AVC is the work output of a Joint Video Team (JVT) of ITU-T Video Coding Experts Group (VCEG) and ISO/IEC MPEG
In addition, there are efforts working towards new video coding standards. One is the development of scalable video coding (SVC) standard in MPEG. This will become MPEG-21 Part 13. The second effort is the development of China video coding standards organized by the China Audio Visual coding Standard Work Group (AVS). AVS has finalized its first video coding specification, AVS 1.0 targeted for SDTV and HDTV applications. Since then the focus has moved to mobile video services. The resulting two standards AVS-M Stage 1 and AVS-M Stage 2 are under development.
Instantaneous Decoding Refresh (IDR) Picture
Instantaneous decoding refresh (IDR) picture was first introduced into H.264, and later was introduced also into AVS-M. IDR pictures are naturally random access points. No subsequent picture can refer to pictures that are earlier than the IDR picture in decoding order. Any picture preceding an IDR picture in decoding order shall also be outputted/displayed earlier than the IDR picture. Each IDR picture leads a coded video sequence that consists of the IDR picture until the next IDR picture in decoding order.
In AVS-M committee draft (CD), there is an 8-bit syntax element picture_distance that indicates the temporal reference of each picture in one coded sequence. The value of picture_distance is equal to the picture_distance value of the previous picture in output/display order plus 1 and plus the number of skipped pictures between the current picture and the previous picture, and then modulo 256. For the first picture of a coded video sequence (IDR picture), the picture_distance value is 0.
Hypothetical Reference Decoder
In video coding standards, a compliant bit stream must be able to be decoded by a hypothetical reference decoder that is conceptually connected to the output of an encoder and consists of at least a pre-decoder buffer, a decoder, and an output/display unit. This virtual decoder is known as the hypothetical reference decoder (HRD) in H.263, H.264 and the video buffering verifier (VBV) in MPEG. PSS Annex G, the annex G of the 3GPP packet-switched streaming service standard (3GPP TS 26.234), specifies a server buffering verifier that can also be considered as an HRD, with the difference that it is conceptually connected to the output of a streaming server. The virtual decoder and buffering verifier are collectively called as hypothetical reference decoder (HRD) in this document. A stream is compliant if it can be decoded by the HRD without buffer overflow or underflow. Buffer overflow happens if more bits are to be placed into the buffer when it is full. Buffer underflow happens if some bits are not in the buffer when the bits are to be fetched from the buffer for decoding/playback.
HRD parameters can be used to impose constraints to the encoded sizes of pictures and to assist deciding the required buffer sizes and start-up delay.
In earlier HRD specifications than in PSS Annex G and H.264, only the operation of the pre-decoded buffer (also called as a coded picture buffer, CPB, in H.264) is specified. The HRD in PSS Annex G and H.264 HRD also specify the operation of the post-decoder buffer (also called as a decoded picture buffer, DBP, in H.264). Further, earlier HRD specifications enable only one HRD operation point, while the HRD in PSS Annex G and H.264 HRD allows for multiple HRD operation points. Each HRD operation point corresponds to a set of HRD parameter values.
The HRD in PSS Annex G is much simpler than H.264 HRD in terms of two factors, 1) specifications of CPB and DPB operations are much simpler, and 2) no timing information from the bitstream is required. Therefore, from this point of view, it is beneficial to use the HRD in PSS Annex G as the basis of the HRD of a video coding standard.
A shortcoming of the HRD design is that it relies on the presentation time (or capturing time) of each picture provided by external means other than the bitstream itself. However, it may be sometimes necessary or desirable that the bitstream itself could be verified. One solution is to utilize the relative presentation time indicated by the temporal reference information (e.g. picture_distance in AVS-M) provided that the time duration of the temporal reference difference of 1 is also signaled in the bitstream.
There is at least one problem associated with the above-described HRD design based on the HRD in PSS Annex G and the relative presentation time according to the temporal information. That is, if the bitstream consists of more than one coded video sequence, then the relative presentation time of a picture in a coded video sequence other than the first coded video sequence cannot be derived because the temporal reference value is reset to 0 at the beginning IDR picture of each coded video sequence. Therefore, the temporal gap between the last picture of a coded video sequence and the beginning IDR picture of the subsequent coded video sequence in output/display order is unclear. This can make the HRD un-optimal. This problem can become more cumbersome if the bitstream was spliced from different coded video sequences originated differently, for example, when a commercial video clip is inserted into another video bitstream.