I. Technical Field of the Invention
The present invention is related to coding a video frame or picture and, in particular, to an arithmetic coding scheme for transform data units or sub-units thereof.
II. Description of the Prior Art
Entropy coders map an input bit stream of binarizations of data values to an output bit stream, the output bit stream being compressed relative to the input bit stream, i.e., consisting of less bits than the input bit stream. This data compression is achieved by exploiting the redundancy in the information contained in the input bit stream.
Entropy coding is used in video coding applications. Natural camera-view video signals show non-stationary statistical behavior. The statistics of these signals largely depend on the video content and the acquisition process. Traditional concepts of video coding that rely on mapping from the video signal to a bit stream of variable length-coded syntax elements exploit some of the non-stationary characteristics but certainly not all of it. Moreover, higher-order statistical dependencies on a syntax element level are mostly neglected in existing video coding schemes. Designing an entropy coding scheme for video coder by taking into consideration these typical observed statistical properties, however, offer significant improvements in coding efficiency.
Entropy coding in today's hybrid block-based video coding standards such as MPEG-2 and MPEG-4 is generally based on fixed tables of variable length codes (VLC). For coding the residual data in these video coding standards, a block of transform coefficient levels is first mapped into a one-dimensional list using an inverse scanning pattern. This list of transform coefficient levels is then coded using a combination of run-length and variable length coding. The set of fixed VLC tables does not allow an adaptation to the actual symbol statistics, which may vary over space and time as well as for different source material and coding conditions. Finally, since there is a fixed assignment of VLC tables and syntax elements, existing inter-symbol redundancies cannot be exploited within these coding schemes.
It is known, that this deficiency of Huffman codes can be resolved by arithmetic codes. In arithmetic codes, each symbol is associated with a respective probability value, the probability values for all symbols defining a probability estimation. A code word is coded in an arithmetic code bit stream by dividing an actual probability interval on the basis of the probability estimation in several sub-intervals, each sub-interval being associated with a possible symbol, and reducing the actual probability interval to the sub-interval associated with the symbol of data value to be coded. The arithmetic code defines the resulting interval limits or some probability value inside the resulting probability interval.
As may be clear from the above, the compression effectiveness of an arithmetic coder strongly depends on the probability estimation as well as the symbols, which the probability estimation is defined on.
A special kind of context-based adaptive binary arithmetic coding, called CABAC, is employed in the H.264/AVC video coding standard. There was an option to use macroblock adaptive frame/field (MBAFF) coding for interlaced video sources. Macroblocks are units into which the pixel samples of a video frame are grouped. The macroblocks, in turn, are grouped into macroblock pairs. Each macroblock pair assumes a certain area of the video frame or picture. Furthermore, several macroblocks are grouped into slices. Slices that are coded in MBAFF coding mode can contain both, macroblocks coded in frame mode and macroblocks coded in field mode. When coded in frame mode, a macroblock pair is spatially sub-divided into a top and a bottom macroblock, the top and the bottom macroblock comprising both pixel samples captured at a first time instant and picture samples captured at the second time instant being different from the first time instant. When coded in field mode, the pixel samples of a macroblock pair are distributed to the top and the bottom macroblock of the macroblock pair in accordance with their capture time.
FIG. 19 shows the CABAC process in the H.264/AVC video coding standard for transform data units. The process starts with the picture of video frame 900 being divided up into frame coded and field coded macroblock pairs. Exemplarily, FIG. 19 shows a field coded macroblock pair 902a and a frame coded macroblock pair 902b. Several processing steps 904 are performed on each macroblock pair 902a and 902b, such as determining the difference of each macroblock to a prediction of the macroblock and performing a discrete cosine transformations on the differences. The result of the processing steps 904 are transform coefficient blocks or 2-dimensional transform coefficient arrays 906a for macroblock pair 902a and transform coefficient arrays 906b for the macroblock pair 902b, more particularly, one transform coefficient array for each sub-array of a macroblock of a macroblock pair.
These 2-dimensional transform coefficient arrays 906a and 906b are pre-coded in a step 908 in order to map the transform coefficients in the arrays 906a and 906b in sequences 910a and 910b of transform data units. The mapping is performed by use of a scanning pattern defining a scanning order among the transform coefficients in a certain array. Now, while the transform coefficient blocks 906b of frame macroblocks 902b are scanned in a zig-zag fashion an alternate scan is used for field macroblocks 902a. The reason is that the pixel samples contained in a field coded macroblock have a different spatial relationship to each other, in particular, a different pixel pitch compared to the pixel samples of frame coded macroblocks.
In particular, the preceding step 908 is performed in three steps: a binary symbol coded_block_flag indicating the presence of significant, i.e. non-zero, transform coefficients is coded. If the coded_block_flag indicates significant coefficients, a significance map specifying the location of significant coefficients is coded. This is done by use of a significant_coeff_flag that indicates as to whether the respective transform coefficient is significant or not. For significant transform coefficients a further flag is determined, i.e. last_significant_coeff_flag. This flag indicates as to whether the respective transform coefficient is the last significant transform coefficient with respect to scanning order. The latter two flags define the significance map. The steps are repeated for consecutive transform coefficients in scanning order until the last significant transform coefficient has been reached. Last but not least, the non-zero levels are coded in reverse scanning order.
FIG. 19 exemplarily shows the transform data units for the first four and three, respectively, scanning positions for the transform coefficient arrays 906a and 906b. Each transform data unit comprises at least the significant_coeff_flag and, in case of a significant transform coefficient, a last_significant_coeff_flag and an indication of the non-zero level. It is noted that the transform data units are illustrated in FIG. 19 as continuous data blocks in a precoded video signal 912 merely for illustration purposes and that in fact, the significance map precedes the non-zero levels of the significant transform coefficients.
The syntax elements, such as the significant_coeff_flags and last_significant_coeff_flags, are then passed to a binary arithmetic coding stage. For both syntax elements, significant_coeff_flag and last_significant_coeff_flag, the context model which specifies the probability estimation to be used in the binary arithmetic coding, is chosen among the same set of context models and merely based on the scanning position within sequences 910a and 910b. This step of choosing the context model is shown at 914. Afterwards, in step 916, the syntax elements are binary arithmetically encoded by use of the context model chosen in step 914.
As turned out from the above discussion, the same set of context models is used for both significance map flags relating to field coded macroblocks and frame coded macroblocks thereby neglecting the fact that different scanning orders or different scanning patterns have been used in order to map the two-dimensional arrays 906a and 906b to sequences 910a and 910b, respectively. Therefore, in case of macroblock adaptive frame/field coding, where both scans can be used within the same slice, the significance indication, i.e., significant_coeff_flag and last_significant_coeff_flag, of transform coefficients related to different frequencies in a transform block 906a and 906b may be coded using the same probability model. As a consequence, the probability model cannot be well adapted to the actual symbol statistic, which are in general different for frame and field macroblocks. In addition, the initialisation tables for the probability models of the syntax elements significant_coeff_flag and last_significant_coeff_flag cannot be suitable for both frame and field macroblocks.