Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Recent developments in video coding standardisation have led to the formation of a group called the “Joint Collaborative Team on Video Coding” (JCT-VC). The Joint Collaborative Team on Video Coding (JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), and members of the International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG).
The Joint Collaborative Team on Video Coding (JCT-VC) has produced a new video coding standard that significantly outperforms the “H.264/MPEG-4 AVC” (ISO/IEC 14496-10) video coding standard. The new video coding standard has been named “high efficiency video coding (HEVC)”. Further development of high efficiency video coding (HEVC) is directed towards introducing improved support for content known variously as ‘screen content’ or ‘discontinuous tone content’. Such content is typical of video output from a computer or a tablet device, e.g. from a DVI connector or as would be transmitted over a wireless HDMI link. Such content is poorly handled by previous video compression standards and thus a new activity directed towards improving the achievable coding efficiency for this type of content is underway.
Video data includes one or more colour channels. Generally there is one primary colour channel and two secondary colour channels. The primary colour channel is generally referred to as the ‘luma’ channel and the secondary colour channel(s) are generally referred to as the ‘chroma’ channels. Video data is represented using a colour space, such as ‘YCbCr’ or ‘RGB’.
For screen content applications, ‘RGB’ is commonly used, as this is the format generally used to drive LCD panels. The greatest signal strength is present in the ‘G’ (green) channel, so generally the G channel is coded using the primary colour channel, and the remaining channels (i.e. ‘B’ and ‘R’) are coded using the secondary colour channels. Such a coding method may be referred to as ‘GBR’. When the ‘YCbCr’ colour space is in use, the ‘Y’ channel is coded using the primary colour channel and the ‘Cb’ and ‘Cr’ channels are coded using the secondary colour channels.
Video data is also represented using a particular chroma format. The spatial sampling of the primary colour channel and the secondary colour channels are spatially sampled at the same spatial density when the 4:4:4 chroma format is in use. For screen content, the commonly used chroma format is 4:4:4, as generally LCD panels provide pixels at a 4:4:4 format. The bit-depth defines the bit width of samples in the respective colour channel, which implies a range of available sample values. Generally, all colour channels have the same bit-depth, although the colour channels may also have different bit-depths.
In high efficiency video coding (HEVC), there are three types of prediction methods used: intra-prediction, intra-block copy prediction and inter-prediction. Intra-prediction and intra-block copy methods allow content of one part of a video frame to be predicted from other parts of the same video frame. Intra-prediction methods typically produce a block of samples having a directional texture, with an intra-prediction mode specifying the direction of the texture and neighbouring samples within a frame used as a basis to produce the texture. The intra-block copy prediction method uses a block of samples from the current frame as a prediction for a current block of samples.
Inter-prediction methods predict the content of a block of samples within a video frame from blocks in previous video frames. The previous video frames (i.e. in ‘decoding order’ as opposed to ‘display order’ which may be different) may be referred to as ‘reference frames’. An inter-predicted prediction unit may reference one reference frame when configured for ‘uni-directional’ inter-prediction or two reference frames when configured for ‘bi-directional’ inter-prediction. One block of samples is obtained from each reference frame. Each reference frame is selected from a list of reference pictures. The block of samples is spatially offset relative to the location of the considered prediction unit (PU) using a motion vector. Where the prediction unit (PU) is configured for bi-directional inter-prediction, separate motion vectors are applied to obtain blocks of samples from each reference frame.
The first video frame within a sequence of video frames typically uses intra-prediction for all blocks of samples within the frame, as no prior frame is available for reference. Subsequent video frames may use one or more previous video frames from which to predict blocks of samples. To maximise coding efficiency, the prediction method that produces a predicted block that is closest to captured frame data is typically used. The remaining difference between the predicted block of samples and the captured frame data is known as the ‘residual’. The spatial domain representation of the difference is generally transformed into a frequency domain representation. Generally, the frequency domain representation compactly stores information present in the spatial domain representation for content captured from an imaging sensor, also referred to as ‘natural content’, ‘continuous tone content’, or ‘camera captured content’. For screen content, sharper edges resulting from the software rendering of content such as fonts, window edges and icons results is less efficiently represented in the frequency domain. The frequency domain representation includes a block of ‘residual coefficients’ that results from applying a transform, such as an integer discrete cosine transform (DCT). Moreover, the residual coefficients (or ‘scaled transform coefficients’) are quantised, which introduces loss but also further reduces the amount of information required to be encoded in a bitstream. The lossy frequency domain representation of the residual, also known as ‘transform coefficients’, may be stored in the bitstream. Each residual coefficient having a nonzero value is referred to as a ‘significant’ residual coefficient. The amount of “lossiness” in the residual recovered in a decoder affects the distortion of video data decoded from a bitstream compared to the captured frame data and the size of the bitstream.