The amount of video data sent over internet, broadcasted networks and mobile networks are increasing for every year. This trend is pushed by the increased usage of over-the-top (OTT) services like Netflix, Hulu and YouTube as well as an increased demand for high quality video and a more flexible way of watching TV and other video services.
To keep up with the increasing bitrate demand for video it is important to have good video compression. Recently, JCT-VC in collaboration with MPEG developed the high efficiency video coding (HEVC) version 1 video codec which efficiently cuts the bitrate in half for the same quality compared to its predecessor AVC/H.264.
HEVC, also referred to as H.265, is a block based video codec that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current picture. A picture consisting of only intra coded blocks is referred to as an I-picture. Temporal prediction is achieved using inter prediction (P), also referred to as uni-predictive prediction, or bi-directional inter prediction (B), also referred to as bi-predictive prediction, on block level. In inter prediction a prediction is made from a single previously decoded picture. In bi-directional inter prediction the prediction is made from a combination of two predictions that may either reference the same previously decoded picture or two different previously decoded pictures. The previously decoded picture(s) is(are) decoded before the current picture and may come before or after the current picture in display time (output order). A picture containing at least one inter coded block but no bidirectional coded inter blocks is referred to as a P-picture. A picture containing at least one bidirectional inter block is referred to as a B-picture. Both P-pictures and B-pictures may also contain intra coded blocks. For a typical block, intra coding is generally much more expensive in bit cost compared to inter coding, which is generally more expensive than bi-predictive coding.
An instantaneous decoding refresh (IDR) picture is an I-picture for which a following picture may not reference a picture prior to the IDR picture. A clean random access (CRA) picture is an I-picture that allows a random access skipped leading (RASL) picture to reference a picture that follow the CRA picture in decoding order and precedes the CRA picture in display or output order. In case the decoding starts at the CRA picture, the RASL pictures must be dropped since they are allowed to predict from pictures preceding the CRA picture that may not be made available for prediction when the CRA picture is used for random access. Broken link access (BLA) pictures are I-pictures that are used for indicating splicing points in the bitstream. Bitstream splicing operations can be performed by changing the picture type of a CRA picture in a first bitstream to a BLA picture and concatenating the stream at a proper position in the other bitstream.
An intra random access point (IRAP) picture may be any one of IDR, CRA or BLA picture. All IRAP pictures guarantees that pictures that follow the IRAP in both decoding and output order do not reference any picture prior to the IRAP picture in decoding order. The first picture of a bitstream must be an IRAP picture, but there may be many other IRAP pictures throughout the bitstream. IRAP pictures provide the possibility to tune in to a video bitstream, for example when starting to watch TV or switching from one TV channel to another. IRAP pictures can also be used for seeking in a video clip, for example by moving the play position using the control bar of a video player. Moreover, an IRAP picture provides a refresh of the video in case there are errors or losses in the video bitstream.
Specific screen content services, such as screen sharing and screen monitoring, are becoming increasingly popular. Screen content puts different demands on video coding than general video content does. Screen content typically includes windows with sharp edges, graphics and text, distinct colors and tends to have areas of the video picture that are not updated for long periods of time.
FIG. 1 shows a typical screen content scene with windows. For this particular scene the background and some windows like the browser window and the command line window are seldom changed, whereas the video window at the top left and the Matlab simulation at the bottom left may be changed for every picture.
During the development of HEVC version 1 the special characteristics of screen content coding was not explicitly addressed. JCT-VC is therefore now working on an extension to HEVC explicitly targeting screen content coding.
Error robustness can as mentioned above be enforced using IRAP pictures inserted in a periodic manner. For low delay video scenarios it is also common to use periodic intra block updates, which in a periodic way refresh every block of the video image using intra block coding, one or a few blocks at a time. Over time, all blocks in the video image have been intra refreshed. However, for videos with motion, errors are still likely to propagate over long periods of time since the intra blocks are not updated all at the same time.
In HEVC and its predecessors the encoded picture may be divided into slices where each slice may contain one or more coding tree units (CTUs). Each slice is independently encoded from the other slices. Although the main advantage of the slice tool is to provide parallel encoding and decoding, the tool also offers some level of error robustness since an error may not propagate over slice borders.
A problem with both periodic IRAPs and periodic intra block updates in a screen content scenario is that all blocks are refreshed, regardless whether the block has changed since the last refresh or not. For video that has parts that are not updated for long periods of time, such as screen content, this way of encoding becomes unnecessary expensive in terms of bits, since intra coding typically is very expensive in terms of bit cost.
Another problem regarding periodic intra block updates, is that an error occurring in a video with motion may propagate over time since typically only a few blocks are updated at a time.