The amount of video data sent over internet, broadcasted networks and mobile networks are increasing every year. This trend is pushed by the increased usage of over-the-top (OTT) services like Netflix, Hulu and YouTube as well as an increased demand for high quality video and a more flexible way of watching TV and other video services.
To keep up with the increasing bitrate demand for video it is important to have good video compression. Recently, JCT-VC in collaboration with MPEG developed the HEVC version 1 video codec (H.265), which efficiently cuts the bitrate in half for the same quality compared to its predecessor AVC/H.264.
HEVC and Random Access
HEVC is a block based video codec that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current picture. A picture consisting of only intra coded blocks is referred to as an I-picture. Temporal prediction is achieved using inter (P) or bi-directional inter (B) prediction on block level. In inter prediction, a prediction of a block is made from a previously decoded picture. In bi-directional inter prediction, the prediction of a block is made from one or two previously decoded pictures, which may come after in output order (display time). A picture containing at least one inter coded block but no bidirectional coded inter blocks is referred to as a P-picture. A picture containing at least one bidirectional inter block is referred to as a B-picture. Both P-pictures and B-pictures may also contain intra coded blocks. Intra coded blocks are typically much more expensive to encode than P-blocks, which are typically more expensive to encode than B-blocks.
An instantaneous decoding refresh (IDR) picture is an I-picture, for which a following picture may not reference a picture prior to the IDR picture. A clean random access (CRA) picture is an I-picture that allows a random access skipped leading (RASL) picture to reference a picture that precedes the CRA picture in output order and decoding order. In case the decoding starts at the CRA picture, the RASL picture must be dropped. Broken link access (BLA) pictures are I-pictures that are used for indicating splicing points in the bitstream. Bitstream splicing operations can be performed by changing the picture type of a CRA picture in a first bitstream to a BLA pictures and concatenating the stream at a proper position in the other bitstream.
An intra random access point (IRAP) picture may be any one of IDR, CRA or BLA picture. All IRAP pictures guarantee that pictures that follow the IRAP in both decoding and output order do not reference any picture prior to the IRAP picture in decoding order. The first picture of a bitstream must be an IRAP picture, but there may be many other IRAP pictures throughout the bitstream. IRAP pictures provide the possibility to tune in to a video bitstream, for example when starting to watch TV or switching from one TV channel to another. IRAP pictures can also be used for seeking in a video clip, for example by moving the play position using a control bar of a video player. Moreover, an IRAP picture provides a refresh of the video in case there are errors or losses in the video bitstream.
Video sequences are typically compressed using a fixed maximum picture distance between IRAP pictures. More frequent IRAP pictures make channel switching faster and increases the granularity of seeking in a video clip. This is balanced against the bit cost of IRAP pictures. Common IRAP picture intervals could vary between 0.5 to 1.0 seconds as illustrative examples.
One way of looking at the difference between IRAP and temporal predictive pictures is that the IRAP picture is like an independent still picture, while a temporal predictive picture is a dependent delta picture relative to previous pictures.
FIG. 1 shows an example video sequence where the first picture is an IRAP picture and the following pictures are P-pictures. The top row shows what is sent in the bitstream and the bottom row shows what the decoded pictures look like. As can be seen, the IRAP picture conveys a full picture while the P-pictures are delta pictures. Since the IRAP picture does not use temporal picture prediction, its compressed size is usually many times larger than a corresponding temporal predictive picture, which is shown as the number of bits for respective compressed picture in FIG. 1.
By looking at actual coded sequences one can get an indication of how much more bits that are spent on IRAP pictures as compared to P pictures. Let us look at the common conditions bitstreams for the HEVC codec that are provided by the JCT-VC standardizations group.
An estimation of the bit-rate savings achievable by converting every IRAP picture except the first to P picture for two sets of sequences is reported in Tables 1 and 2 for different values of the quantization parameter (QP).
TABLE 1HEVC HM11.0 8b YUV 4:2:0SequenceFormatFpsQP22QP27QP32QP37Kimono1920 × 108024−10.50%−11.40%−12.10%−12.10%Nebuta2560 × 160060 −0.60% −1.00% −2.80% −8.90%ParkScene1920 × 108024−13.70%−20.40%−25.80%−29.30%PartyScene832 × 48050 −6.60%−10.30%−14.80%−19.60%PeopleOnStreet2560 × 160030 −2.50% −3.80% −4.30% −4.40%RaceHorses416 × 24030 −4.00% −5.80% −6.70% −7.70%RaceHorses832 × 48030 −2.50% −4.30% −6.50% −8.40%SlideEditing1280 × 720 30−56.50%−57.70%−57.60%−59.90%SlideShow1280 × 720 20−14.80%−17.20%−20.50%−20.30%SteamLocomotive2560 × 160060 −2.60% −5.00% −7.80%−10.40%Traffic2560 × 160030−12.80%−21.90%−28.90%−33.90%Average−11.55%−14.44%−17.07%−19.54%
TABLE 2SCC HM14.0 8b YUV 4:4:4SequenceFormatFpsQP22QP27QP32QP37Basketball_Screen2560 × 144060−26.30%−34.00%−40.10%−44.80%EBURainFruits1920 × 108050 −8.90%−12.30%−14.90%−17.10%Kimono1920 × 108024 −3.80% −4.20% −4.40% −5.90%MissionControlClip2 2560 × 144060 −5.70% −7.10% −8.70% −9.30%MissionControlClip3 1920 × 108060 −7.20% −8.70%−11.50%−17.10%sc_console1920 × 108060 −4.10% −4.40% −5.10% −5.50%sc_desktop1920 × 108060−32.70%−31.40%−29.80%−28.10%sc_flyingGraphics1920 × 108060 −0.60% −0.80% −1.40% −2.10%sc_map1280 × 720 60−10.10%−10.70%−10.30%−13.00%sc_programming1280 × 720 60 −3.60% −5.20% −8.40%−13.00%sc_robot1280 × 720 30−13.40%−21.20%−27.20%−31.30%sc_slideshow1280 × 720 20−16.10%−18.10%−20.10%−19.10%sc_web_browsing1280 × 720 30−14.20%−17.00%−20.40%−19.70%Average−11.28%−13.47%−15.56%−17.38%
DRAP
IRAP pictures can be used in HEVC to enable random access operations and to refresh the video in case of errors. The functionality of IRAP pictures comes with a cost since intra pictures typically are significantly more expensive to encode in terms of number of bits as compared to P- or B-pictures. Dependent RAP (DRAP) pictures have therefore been proposed [1] for HEVC. When performing random access at a DRAP picture, the associated IRAP picture must first be decoded. It is asserted that DRAP pictures may be used to improve the compression efficiency for random access coded video, especially for video services that often have very static content including screen sharing and surveillance video.
Recovery Point SEI
In HEVC, as well as in AVC/H.264, there is a Supplemental Enhancement Information (SEI) message called Recovery Point SEI. The recovery point SEI message assists a decoder in determining when the decoding process will produce acceptable pictures for display after the decoder initiates random access or after the encoder indicates a broken link in the bitstream. When the decoding process is started with the picture in decoding order associated with the recovery point SEI message, all decoded pictures at or subsequent to the recovery point in output order specified in this SEI message are indicated to be correct or approximately correct in content.
ISO Base Media File Format (ISOBMFF)
The ISO base media file format defines a general structure for time-based media files, such as video and audio. It is used as the basis for other media file formats, e.g. container formats MPEG-4 Part 14 (MP4) and 3GPP file format (3GP).
It is designed as a flexible, extensible format that allows editing and presentation of the media. The presentation may be local, or via a network or other stream delivery mechanism including Real-time Transport Protocol (RTP) and MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH).
A sync sample, e.g. ISOBMFF sync sample, is a sample at which decoding may start, and at which no subsequent samples in decoding order are referenced by any samples preceding the sync sample.
A random access point (RAP) sample, e.g. ISOBMFF RAP sample, is similar to a sync sample, except it allows samples after the RAP to be referenced by samples before it. A sync sample is also a RAP sample.