The present invention relates to video encoding and decoding, and more particularly, to a method and an associated apparatus for performing parallel encoding and decoding with ordered entropy slices.
Context-based Adaptive Binary Arithmetic Coding (CABAC) is a powerful entropy coding tool and has been adopted in compression standards. However, conventional sequential CABAC is a bottleneck for parallel processing due to the nature of its serial order of bit-level processing. Recently, parallelization of CABAC has been a topic under discussion since parallel CABAC can greatly accelerate the coding procedures when adopting a multi-core processor.
According to a decoding flow of a prior art for parallel CABAC, referring to A. Segall and J. Zhao, “Entropy slices for parallel entropy decoding,” ITU-T SGI 6/Q.6 Doc. COM16-C405, Geneva, CH, April 2008, and J. Zhao and A. Segall, “New results using entropy slices for parallel decoding,” ITU-T SGI 6/Q.6 Doc. VCEG-AI32, Berlin, Germany, July, 2008, context formation cannot be applied across entropy slices, which results in loss of compression efficiency in comparison with the conventional sequential CABAC. Moreover, during CABAC parsing of all macroblocks, prediction residues and motion vector differences of the entire picture have to be stored and accessed for further decoding. As a result, parsing may be accelerated by parallel processing; however, significant side effects are introduced, as stated in X. Guo, Y-W. Huang, and S. Lei, “Ordered entropy slices for parallel CABAC,” ITU-T SGI 6/Q.6 Doc. VCEG-AK25, Yokohama, Japan, April, 2009.
More particularly, for software or hardware implementations, both the buffer size and data access for prediction residues and motion vector differences of a picture are exceedingly large. In addition, when the buffer is too large to be implemented as an on-chip memory, the buffer will be implemented as an off-chip memory. As a result, the processing speed will be severely decreased due to that the off-chip memory access speed is typically 10 times slower than the on-chip memory, and the power consumption will be greatly increased due to that the off-chip memory power dissipation is typically 10 times larger than the on-chip memory.