To exploit the non-stationary characteristics of input video content, a video encoder relies on an entropy coding to map an input video signal to a bitstream of variable length-coded syntax elements. Frequently-occurring symbols are represented with short codewords while less common symbols are represented with long codewords.
The International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) Standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”) supports two entropy coding methods. In particular, symbols are coded using either variable-length codes (VLCs) or context-adaptive arithmetic coding (CABAC) depending on the entropy encoding mode. Using CABAC, the process of coding a data symbol includes the following three elementary steps:                1. Binarization: In the binarization step, a given non-binary valued syntax element is uniquely mapped to a binary sequence, called a bin string. This process is similar to the process of converting a symbol into a variable-length code but the binary code is further encoded.        2. Context modeling: A “context model” is a probability model for one or more bins of the bin strings, and is chosen from a selection of available models depending on the statistics of recently-coded data symbols. The context model stores the probability of each bin being a “1” or “0”, and it is updated based on the actual coded value.        3. Binary arithmetic coding: An arithmetic coder encodes each bin according to the selected probability model.        
In the MPEG-4 AVC Standard, context models and binarization schemes for each syntax element are defined in the standard. The context model of each bin is identified by a context index γ and each probability model related to a given context index γ is determined by a pair of two values, namely a probability state index σγ and the (binary) value ωγ of the most probable symbol (MPS).
Suppose a pre-defined set of past symbols, called a context template T, and a related set C={0, . . . , C−1} of contexts is given, where the contexts are specified by a modeling function F: T→C operating on the template T. For each symbol x to be coded, a conditional probability p(x|F(z)) is estimated by switching between different probability models according to the already coded neighboring symbols zϵT. After encoding x using the estimated conditional probability p(x|F(z)), the probability model is updated with the value of the encoded symbol x. Thus, p(x|F(z)) is estimated on the fly by tracking the actual source statistics. To reduce the model cost and avoid inaccurate estimates of p(x|F(z)) due to a large number of C, the MPEG-4 AVC Standard puts two restrictions on the choice of the context models. First, very limited context templates T consisting of a few neighbors of the current symbol to encode are employed. Second, context modeling is restricted to the selected bins of the binarized symbols.
At the beginning of each coded slice, the context models are initialized depending on the initial value of the Quantization Parameter (QP) since the quantization parameter has a significant effect on the probability of occurrence of the various data symbols.
CABAC Entropy Coding of Residual Data in the MPEG-4 AVC Standard
For the CABAC coding of the residual data in accordance with the MPEG-4 AVC Standard, the syntax elements and their related coding scheme are characterized by the following distinct features:                A one-bit symbol coded_block_flag and a binary-valued significance map are used to indicate the occurrence and the location of nonzero transform coefficients (namely, significant coefficients) in a given block.        Non-zero levels are encoded in the reverse scanning order.        Context models for coding of nonzero transform coefficients are chosen based on the number of previously transmitted nonzero levels within the reverse scanning path.        
Turning to FIG. 1, an example of the significance map encoding procedure in accordance with the MPEG-4 AVC Standard is indicated generally by the reference numeral 100. The procedure 100 includes a start block 110 that passes control to a function block 120. The function block 120 encodes a syntax element coded_block_flag, and passes control to a decision block 130. The decision block 130 determines whether or not coded_block_flag is equal to one. If so, then control is passed to a function 140. Otherwise, control is passed to an end block 199. The function block 140 performs steps relating to encoding the significance map, and passes control to a function block 150. The function block 150 performs steps relating to encoding level information, and passes control to the end block 199. Thus, regarding decision block 130, if the coded_block_flag indicates that a block has significant coefficients, then a binary-valued significance map is encoded by function block 140. For each coefficient in the scanning order, a one-bit symbol significant_coeff_flag is transmitted by function block 140. If the significant_coeff_flag symbol is equal to one, i.e., if a nonzero coefficient exists at this scanning position, then a further one-bit symbol last_significant_coeff_flag is sent by function block 140. This symbol last_significant_coeff_flag indicates if the current significant coefficient is the last one inside the block or if further significant coefficients follow.
When encoding the significance map of the transform coefficients, the choice of context models of significant_coeff_flag and last_signficant_coeff_flag depend on the scanning position. In the MPEG-4 AVC Standard, for 4×4 or smaller transform sizes, a context model is assigned to significant_coeff_flag and last_significant_coeff_flag for each position, respectively. For the 8×8 transform size and larger, a few transform coefficient positions share one context model in order to reduce the number of context models.
Significance Map Coding in KTA
The video coding experts group (VCEG) “key technical area” (KTA) software has provided a common platform to integrate the new advances in video coding after the MPEG-4 AVC Standard is finalized. Proposals to use extended block sizes and large transforms were adopted into KTA. In the current KTA software, motion partitions larger than 16×16 pixels are implemented. In particular, macroblocks of sizes 64×64, 64×32, 32×64, 32×32, 32×16, 16×32 are used in addition to the existing MPEG-4 AVC Standard partitioning sizes. Larger block transforms are also used to better capture the smoother content in the high-definition video. Such larger block transforms include those having sizes of 16×16, 16×8, and 8×16. To keep the number of context models low, 8×16, 16×8, and 16×16 transforms all use 15 or fewer context models for significant_coeff_flag or last_significant_coeff_flag. Turning to FIG. 2, an example of using 15 contexts for a syntax element significant_coeff_flag for an 8×8 block is indicated generally by the reference numeral 200. In further detail, example 200 illustrates how multiple transform coefficient positions in a block share one context model when an 8×8 transform is used for significant_coeff_flag. Each different number represents a context model. When a number is repeated at multiple positions, these positions share one context model. In this approach, how multiple transform coefficient positions share one context, denoted as context sharing, is designed for each transform size. The exact pattern of context sharing is denoted as the context sharing map.
Turning to FIG. 3, an example of using 15 contexts for a syntax element significant_coeff_flag for a 16×16 block is indicated generally by the reference numeral 300. In further detail, example 300 illustrate how multiple transform coefficient positions in a block share one context model when a 16×16 transform is used for significant_coeff_flag. The pattern for context sharing of 16×16 transform is approximately an upsampled version of that of 8×8. However, this may disadvantageously fail to capture the difference in coefficient distributions of different transforms.
Significance Map Coding in a Particular Prior Art Approach
In a particular prior art approach, a new context modeling approach was proposed for 8×8 transform sizes and larger. To model the contexts for the syntax element significant_coeff_flag for 8×8 blocks, the transform block is decomposed into 16 sub-blocks of 2×2 samples, and each of these sub-blocks is associated with a separate context. The context model selection for larger transform blocks (e.g., for blocks greater than 8×8) is based on the number of already coded significant transform coefficients in a predefined neighborhood (inside the transform block). For coding of the last_significant_coeff_flag, a context modeling has been designed that depends on a distance measure of the current scan position to the top-left corner of the given transform block. To be more specific, the context model for coding the last_significant_coeff_flag is chosen based on the scan diagonal on which the current scanning position lies (i.e., it is chosen based on x+y, where x and y represent the horizontal and vertical location of a scanning position inside the transform block, respectively). To avoid over-fitting, the distance measure x+y is mapped on a reduced set of context models in a certain way (e.g., by quantizing x+y).
In the particular prior art method, the pattern for context sharing of an 8×8 transform is approximately an upsampled version of that of a 4×4 transform. However, this may also disadvantageously fail to capture the difference in coefficient distributions of different transforms.
In the existing video coding standards, when coding the significance map of the transform coefficients of 8×8 transforms or larger, one context is shared among several transform coefficients to reduce the number of contexts. Separate methods are used for various transforms on how to share the contexts. Such designs cannot be easily extended to future standards where more transforms may be used.