Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Recent developments in video coding standardisation have led to the formation of a group called the “Joint Collaborative Team on Video Coding” (JCT-VC). The Joint Collaborative Team on Video Coding (JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), and members of the International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG).
The Joint Collaborative Team on Video Coding (JCT-VC) has produced a new video coding standard that significantly outperforms the “H.264/MPEG-4 AVC” video coding standard. The new video coding standard has been named “high efficiency video coding (HEVC)”. Further development of an extension to high efficiency video coding (HEVC) is directed towards improving compression efficiency for a category of video data sometimes referred to as ‘screen content’. Screen content includes video data produced from devices such as personal computers. Such content is characterised by lots of high frequency content (i.e. sharp edges). Such content is generally not compressed very well by traditional transform-based video compression techniques. Generally transform-based video compression introduces substantial artifacts which reduces the subjective quality of the decoded video data. Although high efficiency video coding (HEVC) supports some tools to assist in compressing such content, further tools are under study within the joint collaborative team on video coding (JCT-VC) for possible inclusion into a future amendment of high efficiency video coding (HEVC). Efficient coding of screen content is highly desirable for applications such as remote desktop, cloud gaming and virtualisation and wireless HDMI, as high resolution screen displays need to be transmitted over networks having limited or otherwise costly bandwidth.
Video data includes one or more colour channels. Typically three colour channels are supported and colour information is represented using a ‘colour space’. One example colour space is known as ‘YCbCr’, although other colour spaces are also possible. The ‘YCbCr’ colour space enables fixed-precision representation of colour information and thus is well suited to digital implementations. The ‘YCbCr’ colour space includes a ‘luma’ channel (Y) and two ‘chroma’ channels (Cb and Cr). Each colour channel has a particular bit-depth. The bit-depth defines the width of samples in the respective colour channel in bits. Generally, all colour channels have the same bit-depth, although having different bit-depths is also possible. The relationship between the spatial sampling of the luma channel and the spatial sampling of the chroma channels is referred to as the ‘chroma format’. When a ‘4:4:4’ chroma format is used, the chroma channels are spatially sampled with the same frequency as the luma channel. When a ‘4:2:0’ or a ‘4:2:2’ chroma format is selected, the chroma channels are sampled less frequently than the luma channel. In the case of 4:2:0, one chroma sample in each chroma channel is present for every 2×2 set of luma samples. In the case of 4:2:2, one chroma sample in each chroma channel is present for every 2×1 set of luma samples.
In high efficiency video coding (HEVC), there are three types of prediction available: intra-prediction, intra block copy prediction and inter-prediction. Intra-prediction methods allow content of one part of a video frame to be predicted from other parts of the same video frame. Intra-prediction methods typically produce a block having a directional texture, with an intra-prediction mode specifying the direction of the texture and neighbouring samples within a frame used as a basis to produce the texture. Intra block copy prediction allows a spatially local block of samples from the current frame to be used as a prediction for a current block. Inter-prediction methods allow the content of a block within a video frame to be predicted from blocks in previous video frames. The previous video frames (i.e. in ‘decoding order’ as opposed to ‘display order’ which may be different) are referred to as ‘reference frames’. Blocks in the first frame of a sequence typically use intra-prediction or intra block copy mode. Inter-prediction is not available to such blocks because no reference frame(s) are available. To maximise coding efficiency, the prediction method that produces a predicted block that is closest to captured frame data is typically used. The remaining difference between the predicted block and the captured frame data is known as the ‘residual’. This spatial domain representation of the difference is generally transformed into a frequency domain representation and quantised. Generally, the frequency domain representation compactly stores the information present in the spatial domain representation for ‘natural content’, i.e. content that was captured by an imaging sensor. The frequency domain representation includes a block of ‘residual coefficients’ that results from applying a transform, such as an integer discrete cosine transform (DCT). Moreover, the residual coefficients (or ‘scaled transform coefficients’) are quantised, which introduces loss but also further reduces the amount of information required to be encoded in a bitstream. The lossy frequency domain representation of the residual, also known as ‘transform coefficients’, may be stored in the bitstream. The amount of lossiness in the residual recovered in a decoder affects the distortion of video data decoded from the bitstream compared to the captured frame data and the size of the bitstream.
A video bitstream includes a sequence of encoded syntax elements. The syntax elements are ordered according to a hierarchy of ‘syntax structures’. Each syntax element is composed of one or more ‘bins’, which are encoded using a ‘context adaptive binary arithmetic coding’ (CABAC) algorithm. A given bin may be ‘bypass’ coded, in which case there is no ‘context’ associated with the bin. Alternatively, a bin may be ‘context’ coded, in which case there is context associated with the bin. Each context coded bin has one context associated with the bin, selected from a set of one or more contexts from a context memory. The selected context is retrieved from a context memory and each time a context is used (i.e. selected), the context is also updated and then stored back in the context memory. When encoding or decoding the bin, prior information available in the bitstream is used to select which context to use. Context information in the decoder necessarily tracks context information in the encoder (otherwise a decoder could not parse a bitstream produced by an encoder). The context includes two parameters: a likely bin value (or ‘valMPS’) and a probability level (or ‘pStateIdx’). A syntax element with two distinct values may also be referred to as a ‘flag’ and is generally encoded and decoded using one context coded bin. A syntax element with more distinct values requires more than one bin, and may use a combination of context coded bins and bypass coded bins. In the high efficiency video coding (HEVC) standard, syntax elements are grouped into syntax structures. A given syntax structure defines the possible syntax elements that can be included in the video bitstream and the circumstances in which each syntax element is included in the video bitstream. Each instance of a syntax element contributes to the size of the video bitstream. An objective of video compression is to enable representation of a given sequence using a video bitstream and having minimal size (e.g. in bytes) for a given quality level, i.e. distortion of the output frames compared to the input frame data for lossy encoding. At the same time, video decoders are invariably required to decode video bitstreams in real time, placing limits on the complexity of the algorithms that can be used. As such, a trade-off between algorithmic complexity and compression performance is made. In particular, modifications that can improve or maintain compression performance while reducing algorithmic complexity are desirable.
Coding tools that achieve improvement in coding screen content are desirable, however the complexity of new coding tools (in particular for real-time and low-cost implementation) must be balanced against the coding improvement obtained.