Data compression occurs in a number of contexts. It is very commonly used in communications and computer networking to store, transmit, and reproduce information efficiently. It finds particular application in the encoding of images, audio and video. Video presents a significant challenge to data compression because of the large amount of data required for each video frame and the speed with which encoding and decoding often needs to occur. The current state-of-the-art for video encoding is the ITU-T H.264/AVC video coding standard. It defines a number of different profiles for different applications, including the Main profile, Baseline profile and others. A next-generation video encoding standard is currently under development through a joint initiative of MPEG-ITU: High Efficiency Video Coding (HEVC).
There are a number of standards for encoding/decoding images and videos, including H.264, that use block-based coding processes. In these processes, the image or frame is divided into blocks, typically 4×4 or 8×8, and the blocks are spectrally transformed into coefficients, quantized, and entropy encoded. In many cases, the data being transformed is not the actual pixel data, but is residual data following a prediction operation. Predictions can be intra-frame, i.e. block-to-block within the frame/image, or inter-frame, i.e. between frames (also called motion prediction). It is expected that HEVC (may also be called H.265) will also have these features.
When spectrally transforming residual data, many of these standards prescribe the use of a discrete cosine transform (DCT) or some variant thereon. The resulting DCT coefficients are then quantized using a quantizer to produce quantized transform domain coefficients, or indices.
The block or matrix of quantized transform domain coefficients (sometimes referred to as a “transform unit”) is then entropy encoded using a particular context model. In H.264/AVC and in the current development work for HEVC, the quantized transform coefficients are encoded by (a) encoding a last significant coefficient position indicating the location of the last non-zero coefficient in the block, (b) encoding a significance map indicating the positions in the block (other than the last significant coefficient position) that contain non-zero coefficients, (c) encoding the magnitudes of the non-zero coefficients, and (d) encoding the signs of the non-zero coefficients. This encoding of the quantized transform coefficients often occupies 30-80% of the encoded data in the bitstream.
Transform units are typically NxN. Common sizes include 4×4, 8×8, 16×16, and 32×32, although other sizes are possible. The entropy encoding of the symbols in the significance map is based upon a context model. In the case of a 4×4 luma or chroma block or transform unit (TU), a separate context is associated with each coefficient position in the TU. That is, the encoder and decoder track a total of 30 (excluding the bottom right corner positions) separate contexts for 4×4 luma and chroma TUs. The 8×8 TUs are partitioned (conceptually for the purpose of context association) into 2×2 blocks such that one distinct context is associated with each 2×2 block in the 8×8 TU. Accordingly, the encoder and decoder track a total of 16+16=32 contexts for the 8×8 luma and chroma TUs. This means the encoder and decoder keep track of and look up 62 different contexts during the encoding and decoding of the significance map. When 16×16 TUs and 32×32 TUs are taken into account, the total number of distinct contexts involved is 88. Among the additional 26 contexts, 13 are for luma TUs and 13 are for chroma TUs. The assignment of the 13 contexts to the coefficient positions in a 16×16 or 32×32 TU is as follows. Let (r, c) denote a position in the TU, where 0<=r, c<=15 if the TU is of size 16×16, and 0<=r, c<=31 if the TU is of size 32×32. Then 3 distinct contexts are assigned to the three positions (0, 0), (0, 1), (1, 0) at the top-left corner including the DC position (0, 0); 5 distinct contexts are assigned to positions in the region {(r, c): 2<=r+c<5}; and the last 5 distinct contexts are assigned to all the remaining positions.
Except for the first 3 contexts for (0, 0), (0, 10), and (1, 0), the derivation of the context for a position in the region {(r, c): 2<=r+c<5} depends on its lower-right neighborhood. Let s(r, c) denote the significance flag of a coefficient at position (r, c), i.e., s(r, c)=1 if the coefficient is not zero and s(r, c)=1 otherwise. The context for position (r, c) is equal to min(s(r+1, c)+s(r, c+1)+s(r+2,c)+s(r, c+2)+s(r+1,c+1), 4), where min(a, b) returns the smaller value between a and b. The context of a position (r, c) in the remaining region {(r, c): r+c>=5} is similarly derived.
The contexts for 4×4 and 8×8 significance maps are determined by the bit position. The contexts for 16×16 and 32×32 significance maps are mostly determined by the values of the neighboring bits. The determination of context for the 16×16 and 32×32 significance maps is fairly computationally intense, because in most cases the processor determines context by looking at the values of neighboring significant flags, which involves costly memory access operations.
Similar reference numerals may have been used in different figures to denote similar components.