The amount of video data sent over internet, broadcasted networks and mobile networks are increasing for every year. This trend is pushed by the increased usage of over-the-top (OTT) services like Netflix, Hulu and YouTube as well as an increased demand for high quality video and a more flexible way of watching TV and other video services.
To keep up with the increasing bitrate demand for video it is important to have good video compression. Recently, JCT-VC in collaboration with MPEG developed the high efficiency video coding (HEVC) version 1 video codec which efficiently cuts the bitrate in half for the same quality compared to its predecessor AVC/H.264.
HEVC, also referred to as H.265, is a block based video codec that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current picture. A picture consisting of only intra coded blocks is referred to as an I-picture. Temporal prediction is achieved using inter prediction (P), also referred to as uni-predictive prediction, or bi-directional inter prediction (B), also referred to as bi-predictive prediction, on block level. In inter prediction a prediction is made from a single previously decoded picture. In bi-directional inter prediction the prediction is made from a combination of two predictions that may either reference the same previously decoded picture or two different previously decoded pictures. The previously decoded picture(s) is(are) decoded before the current picture and may come before or after the current picture in display time (output order). A picture containing at least one inter coded block but no bi-directional coded inter blocks is referred to as a P-picture. A picture containing at least one bi-directional inter block is referred to as a B-picture. Both P-pictures and B-pictures may also contain intra coded blocks. For a typical block, intra coding is generally much more expensive in bit cost compared to inter coding, which is generally more expensive than bi-predictive coding.
An instantaneous decoding refresh (IDR) picture is an I-picture for which a following picture may not reference a picture prior to the IDR picture. A clean random access (CRA) picture is an I-picture that allows a random access skipped leading (RASL) picture to reference a picture that follow the CRA picture in decoding order and precedes the CRA picture in display or output order. In case the decoding starts at the CRA picture, the RASL pictures must be dropped since they are allowed to predict from pictures preceding the CRA picture that may not be made available for prediction when the CRA picture is used for random access. Broken link access (BLA) pictures are I-pictures that are used for indicating splicing points in the bitstream. Bitstream splicing operations can be performed by changing the picture type of a CRA picture in a first bitstream to a BLA picture and concatenating the stream at a proper position in the other bitstream.
An intra random access point (IRAP) picture may be any one of IDR, CRA or BLA picture. All IRAP pictures guarantees that pictures that follow the IRAP in both decoding and output order do not reference any picture prior to the IRAP picture in decoding order. The first picture of a bitstream must be an IRAP picture, but there may be many other IRAP pictures throughout the bitstream. IRAP pictures provide the possibility to tune in to a video bitstream, for example when starting to watch TV or switching from one TV channel to another. IRAP pictures can also be used for seeking in a video clip, for example by moving the play position using the control bar of a video player, and dynamic streaming services. Moreover, an IRAP picture provides a refresh of the video in case there are errors or losses in the video bitstream and thereby improves the error robustness of a video bitstream.
Digital TV exists in three forms, terrestrial, satellite and cable, which are generally referred to as broadcasting services and one form, Internet Protocol Television (IPTV), which is generally referred to as multicast service. In all of these services a receiver receives the video bitstream of one TV channel, which is then decoded and the decoded video is displayed to the end user. It is common that the receiver additionally is capable of receiving video bitstreams of one or more additional channels that are received in order to provide the user with the ability to watch that channel/program later.
In adaptive streaming services the bitrate that is received by the receiver is adjusted to match the capabilities of the network. In dynamic adaptive streaming over hypertext transfer protocol (HTTP) (DASH), HTTP live streaming (HLS) and smooth streaming the user client selects bitrate over a chunk or segment, typically representing 10 seconds of video out of a set of different representations provided by a server.
In video conferencing and video telephony services there is a two-way communication between user clients. It is possible to use feedback messages to indicate packet losses or corruption in decoded pictures. Reference Picture Selection Indication (RPSI) is a feedback message that makes it possible for a receiver to indicate that an old picture should be used for reference because one or more recently transmitted pictures might not have been able to be decoded. By using an old picture that was correctly received and decoded, such as the previous IRAP picture, for reference, the encoder does not have to send a new intra picture. However, after having sent a feedback message, such as RPSI, the receiver will not know exactly when the sender has acknowledged the message and used only the selected reference picture for reference.
In broadcast and multicast services there is a desire to keep the channel switching time as short as possible. However, in order to switch to another channel there needs to be a random access point (RAP) in the video bitstream of the another channel. However, using IRAP pictures as RAPs makes the video bitstream more expensive to encode and will consequently increase the bitrate substantially compared to a video bitstream without intra pictures.
In adaptive streaming, in order to switch from one representation to another there needs to be an access point in the representation to which the user client chooses to switch. This is today typically realized with IRAP pictures. IRAP pictures are also used when a user selects to jump to a different position in the video bitstream. These IRAP pictures increase the bitrate substantially compared to a video bitstream without IRAP pictures.