Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently in development. Recent developments in video coding standardisation have led to the formation of a group called the “Joint Collaborative Team on Video Coding” (JCT-VC). The Joint Collaborative Team on Video Coding (JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), and members of the International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11 (ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG).
The Joint Collaborative Team on Video Coding (JCT-VC) has produced a new video coding standard that significantly outperforms the “H.264/MPEG-4 AVC” (ISO/IEC 14496-10) video coding standard. The new video coding standard has been named “high efficiency video coding (HEVC)”. Further development of high efficiency video coding (HEVC) is directed towards introducing improved support for content known variously as ‘screen content’ or ‘discontinuous tone content’. Such content is typical of output from a computer or a tablet device. Such content is poorly handled by previous video compression standards and thus a new activity directed towards improving the achievable coding efficiency for this type of content is underway.
Video data includes one or more colour channels. Typically three colour channels are supported and colour information is represented using a ‘colour space’. One example colour space is known as ‘YCbCr’, although other colour spaces are also possible. The ‘YCbCr’ colour space enables fixed-precision representation of colour information and thus is well suited to digital implementations. The ‘YCbCr’ colour space includes a ‘luma’ channel (Y) and two ‘chroma’ channels (Cb and Cr). Each colour channel has a particular bit-depth. The bit-depth defines the bit width of samples in the respective colour channel, which implies a range of available sample values. Generally, all colour channels have the same bit-depth, although they may also have different bit-depths. Screen content is often encoded using an ‘RGB’ (i.e. red green blue) colour space.
In high efficiency video coding (HEVC), there are three types of prediction methods used: intra-prediction, intra block copy prediction and inter-prediction. Intra-prediction methods allow content of one part of a video frame to be predicted from other parts of the same video frame. Intra-prediction methods typically produce a block having a directional texture, with an intra-prediction mode specifying the direction of the texture and neighbouring samples within a frame used as a basis to produce the texture. The intra block copy prediction method uses a block of samples from the current frame as a prediction for a current block. Inter-prediction methods predict the content of a block within a video frame from blocks in previous video frames. The previous video frames (i.e. in ‘decoding order’ as opposed to ‘display order’ which may be different) may be referred to as ‘reference frames’.
The first video frame within a sequence of video frames typically uses intra-prediction for all blocks of samples within the frame, as no prior frame is available for reference. Subsequent video frames may use one or more previous video frames from which to predict blocks of samples. To maximise coding efficiency, the prediction method that produces a predicted block that is closest to captured frame data is typically used. The remaining difference between the predicted block of samples and the captured frame data is known as the ‘residual’. This spatial domain representation of the difference is generally transformed into a frequency domain representation. Generally, the frequency domain representation compactly stores information present in the spatial domain representation. The frequency domain representation includes a block of ‘residual coefficients’ that results from applying a transform, such as an integer discrete cosine transform (DCT). Moreover, the residual coefficients (or ‘scaled transform coefficients’) are quantised, which introduces loss but also further reduces the amount of information required to be encoded in a bitstream. The lossy frequency domain representation of the residual, also known as ‘transform coefficients’, may be stored in the bitstream. The amount of lossiness in the residual recovered in a decoder affects the distortion of video data decoded from the bitstream compared to the captured frame data and the size of the bitstream.
A video bitstream includes a sequence of encoded syntax elements. The syntax elements are ordered according to a hierarchy of ‘syntax structures’. Each syntax element is composed of one or more ‘bins’, which are encoded using a ‘context adaptive binary arithmetic coding’ (CABAC) algorithm. A given bin may be ‘bypass’ coded, in which case there is no ‘context’ associated with the bin. Alternatively, a bin may be ‘context’ coded, in which case there is context associated with the bin. Each context coded bin has one context associated with the bin, where the context is selected from a set of one or more contexts. The selected context is retrieved from a context memory and each time a context is used (i.e. selected), the context is also updated and then stored back in context memory. When encoding or decoding the bin, prior information in the bitstream is used to select which context to use. Context information in a decoder necessarily tracks context information in the encoder (otherwise a decoder could not parse a bitstream produced by an encoder). The context includes two parameters: a likely bin value (or ‘valMPS’) and a probability level (or ‘pStateIdx’).
A syntax element with two distinct values may also be referred to as a ‘flag’ and is generally encoded and decoded using one context coded bin. A syntax element with more distinct values requires more than one bin, and may use a combination of context coded bins and bypass coded bins. In the high efficiency video coding (HEVC) standard, syntax elements are grouped into syntax structures. A given syntax structure defines the possible syntax elements that can be included in a video bitstream and the circumstances in which each syntax element is included in the video bitstream. Each instance of a syntax element contributes to the size of the video bitstream.
An objective of video compression is to enable representation of a given sequence using a video bitstream and having minimal size (e.g. in bytes) for a given quality level (including both lossy and lossless cases). At the same time, video decoders are invariably required to decode video bitstreams in real time, placing limits on the complexity of the algorithms that can be used. As such, a trade-off between algorithmic complexity and compression performance is made. In particular, modifications that can improve or maintain compression performance while reducing algorithmic complexity are desirable.
For each block predicted using intra block copy mode, a vector or delta vector is present in the bitstream to signal the location of the reference block, relative to current block position. The statistical distribution of block vector values can vary greatly and efficient methods to encode such block vectors are highly desirable.