Many applications for video coding currently exist, including applications for transmission and storage of video data. Many video coding standards have also been developed and others are currently under development. Recent developments in video coding standardisation have led to the formation of a group called the “Joint Collaborative Team on Video Coding” (JCT-VC). The Joint Collaborative Team on Video Coding (JCT-VC) includes members of Study Group 16, Question 6 (SG16/Q6) of the Telecommunication Standardisation Sector (ITU-T) of the International Telecommunication Union (ITU), known as the Video Coding Experts Group (VCEG), and members of the International Organisations for Standardisation/International Electrotechnical Commission Joint Technical Committee 1/Subcommittee 29/Working Group 11(ISO/IEC JTC1/SC29/WG11), also known as the Moving Picture Experts Group (MPEG).
The Joint Collaborative Team on Video Coding (JCT-VC) has the goal of producing a new video coding standard to significantly outperform a presently existing video coding standard, known as “H.264/MPEG-4 AVC”. The H.264/MPEG-4 AVC standard is itself a large improvement on previous video coding standards, such as MPEG-4 and ITU-T H.263. The new video coding standard under development has been named “high efficiency video coding (HEVC)”. The Joint Collaborative Team on Video Coding JCT-VC is also considering implementation challenges arising from technology proposed for high efficiency video coding (HEVC) that create difficulties when scaling implementations of the standard to operate at high resolutions or high frame rates.
One area of the H.264/MPEG-4 AVC video coding standard that presents difficulties for achieving high compression efficiency is the coding of residual coefficients used to represent video data. Video data is formed by a sequence of frames, with each frame having a two-dimensional array of samples. Typically, frames include one luminance and two chrominance channels. Each frame is decomposed into an array of largest coding units (LCUs). The largest coding units (LCUs) have a fixed size, with edge dimensions being a power of two and having equal width and height, such as 64 luma samples. A coding tree enables the subdivision of each largest coding unit (LCU) into four coding units (CUs), each having half the width and height of a parent largest coding unit (LCU). Each of the coding units (CUs) may be further subdivided into four equally-sized coding units (CUs). Such a subdivision process may be applied recursively until a smallest coding unit (SCU) size is reached, enabling coding units (CUs) to be defined down to a minimum supported size. The recursive subdivision of a largest coding unit, into a hierarchy of coding units is referred, has a quadtree structure and is referred to as the coding tree. This subdivision process is encoded in a communications bitstream as a sequence of flags, coded as bins. Coding units therefore have a square shape.
A set of coding units exist in the coding tree that are not further sub-divided, occupying the leaf nodes of the coding tree. Transform trees exist at these coding units. A transform tree may further decompose a coding unit using a quadtree structure as used for the coding tree. At the leaf nodes of the transform tree, residual data is encoded using transform units (TUs). In contrast to the coding tree, the transform tree may subdivide coding units into transform units having a non-square shape. Further, the transform tree structure does not require that transform units (TUs) occupy all of the area provided by the parent coding unit.
Each coding unit at the leaf nodes of the coding trees are subdivided into one or more arrays of predicted data samples, each known as a prediction unit (PU). Each prediction unit (PU) contains a prediction of a portion of the input video frame data, derived by applying an intra-prediction or an inter-prediction process. Several methods may be used for coding prediction units (PUs) within a coding unit (CU). A single prediction unit (PU) may occupy an entire area of the coding unit (CU), or the coding unit (CU) may be split into two equal-sized rectangular prediction units (PUs), either horizontally or vertically. Additionally, the coding units (CU) may be split into four equal-sized square prediction units (PUs).
A video encoder compresses the video data into a bitstream by converting the video data into a sequence of syntax elements. A context adaptive binary arithmetic coding (CABAC) scheme is defined within the high efficiency video coding (HEVC) standard under development, using an identical arithmetic coding scheme as to that defined in the MPEG4-AVC/H.264 video compression standard. In the high efficiency video coding (HEVC) standard under development, when context adaptive binary arithmetic coding (CABAC) is in use, each syntax element is expressed as a sequence of bins, where the bins are selected from a set of available bins. The set of available bins is obtained from a context model, with one context per bin. Each context holds a likely bin value (the ‘valMPS’), and a probability state for the arithmetic encoding or arithmetic decoding operation. Note that bins may also be bypass coded, where there is no association with a context. Bypass coded bins consume one bit in the bitstream and therefore are suited to bins with equal probability of being one-valued or zero-valued. Creating such a sequence of bins from a syntax element is known as “bin arising” the syntax elements.
In a video encoder or video decoder, as separate context information is available for each bin, context selection for bins provides a means to improve coding efficiency. In particular, coding efficiency may be improved by selecting a particular bin such that statistical properties from previous instances of the bin, where the associated context information was used, correlate with statistical properties of a current instance of the bin. Such context selection frequently utilises spatially local information to determine the optimal context.
In the high efficiency video coding (HEVC) standard under development and in H.264/MPEG-4 AVC, a prediction for a current block is derived, based on reference sample data either from other frames, or from neighbouring regions within the current block that have been previously decoded. The difference between the prediction and the desired sample data is known as the residual. A frequency domain representation of the residual is a two-dimensional array of residual coefficients. By convention, the upper-left corner of the two-dimensional array contains residual coefficients representing low-frequency information.
In typical video data, the majority of the changes in sample values are gradual, resulting in a predominance of low-frequency information within the residual. This manifests as larger magnitudes for residual coefficients located in the upper-left corner of the two-dimensional array.
The property of low-frequency information being predominant in the upper-left corner of the two-dimensional array of residual coefficients may be exploited by the chosen binarisation scheme to minimise the size of the residual coefficients in the bitstream.
HM-5.0 divides the transform unit (TU) into a number of sub-sets and scans the residual coefficients in each sub-set in two passes. The first pass encodes flags indicating the status of the residual coefficients as being nonzero-valued (significant) or zero-valued (non-significant). This data is known as a significance map. A second pass encodes the magnitude and sign of significant residual coefficients, known as the coefficient levels.
A provided scan pattern enables scanning the two-dimensional array of residual coefficients into a one-dimensional array. In the HM-5.0, the provided scan pattern is used for processing both the significance map and the coefficient levels. By scanning the significance map using the provided scan pattern, the location of the last significant coefficient in the two-dimensional significance map may be determined. Scan patterns may be horizontal, vertical or diagonal.
The high efficiency video coding (HEVC) test model 5.0 (HM-5.0) provides support for residual blocks, also known as transform units (TUs) having both a square shape and a non-square shape. Each transform unit (TU) contains a set of residual coefficients. Residual blocks having equally sized side dimensions are known as square transform units (TUs) and residual blocks having unequally sized side dimensions are known as non-square transform units (TUs).
Transform unit (TU) sizes supported in HM-5.0 are 4×4, 8×8, 16×16, 32×32, 4×16, 16×4, 8×32 and 32×8. Transform unit (TU) sizes are typically described in terms of luma-samples, however when a chroma format of 4:2:0 is used, each chroma sample occupies the area of 2×2 luma samples. Accordingly, scanning transform units (TUs) to encode chroma residual data uses scan patterns of half the horizontal and vertical dimensions, such as 2×2 for a 4×4 luma residual block. For the purpose of scanning and coding the residual coefficients, the 16×16, 32×32, 4×16, 16×4, 8×32 and 32×8 transform units (TUs) are divided into a number of sub-blocks, i.e.: a lower-layer of the transform unit (TU) scan, having a size of 4×4, with a corresponding map existing within HM-5.0. In HM-5.0, sub-blocks for these transform unit (TU) sizes are co-located with sub-sets in the transform unit (TU). The set significant coefficient flags within a portion of the significance map collocated within one sub-block is referred to as a significant coefficient group. For the 16×16, 32×32, 4×16, 16×4, 8×32 and 32×8 transform units (TUs), the significance map coding makes use of a two-level scan. The upper level scan performs a scan, such as a backward diagonal down-left scan, to code or infer flags representing the significant coefficient groups of each sub-block. Within the sub-blocks, a scan, such as the backward diagonal down-left scan, is performed to code the significant coefficient flags for sub-blocks having a one-valued significant coefficient group flag. For a 16×16 transform unit (TU), a 4×4 upper-level scan is used. For a 32×32 transform unit (TU), an 8×8 upper-level scan is used. For 16×4, 4×16, 32×8 and 8×32 transform unit (TU) sizes, 4×1, 1×4, 8×2 and 2×8 upper-level scans are used respectively.
At each transform unit (TU), residual coefficient data may be encoded into a bitstream. Each “residual coefficient” is a number representing image characteristics within a transform unit in the frequency (DCT) domain and occupying a unique location within the transform unit. A transform unit is a block of residual data samples that may be transformed between the spatial and the frequency domains. In the frequency domain, the transform unit (TU) encodes the residual data samples as residual coefficient data. Side dimensions of transform units are sized in powers of two (2), ranging from 4 samples to 32 samples for a “Luma” channel, and 2 to 16 samples for a “Chroma” channel. The leaf nodes of the transform unit (TU) tree may contain either a transform unit (TU) or nothing at all, in the case where no residual coefficient data is required.
As the spatial representation of the transform unit is a two-dimensional array of residual data samples, as described in detail below, a frequency domain representation resulting from a transform, such as a modified discrete cosine transform (DCT), is also a two-dimensional array of residual coefficients. The spectral characteristics of a typical sample data within a transform unit (TU) are such that the frequency domain representation is more compact than the spatial representation. Further, the predominance of lower-frequency spectral information typical in a transform unit (TU) results in a clustering of larger-valued residual coefficients towards the upper-left of the transform unit (TU), where low-frequency residual coefficients are represented.
Modified discrete cosine transforms (DCTs) or modified discrete sine transforms (DSTs) may be used to implement the residual transform. Implementations of the residual transform are configured to support each required transform unit (TU) size. In a video encoder, the residual coefficients from the residual transform are scaled and quantised. The scaling and quantisation reduces the magnitude of the residual coefficients, reducing the size of the data coded into the bitstream at the cost of reducing the image quality.
One aspect of the complexity of the high efficiency video coding (HEVC) standard under development is the number of look-up tables required in order to perform the scanning. Each additional look-up table results in an undesirable consumption of memory and hence reducing the number of look-up tables required is one aspect of complexity reduction.