1. Technical Field
The present invention relates to systems and methods for motion estimation and mode decision for low-complexity H.264 standard encoders/decoders.
2. Background Art
Emerging video coding standards like H.264 achieve significant advances in improving video quality and reducing bandwidth, but generally at the cost of greatly increased computational complexity at both the encoder and the decoder. Playing encoded videos produced by such compression standards requires substantial computational resources and thus results in substantial power consumption. This may be a serious concern in power-sensitive applications, such as handheld devices and other devices used in mobile applications.
Many portable media application devices such as mobile handheld devices are becoming increasingly popular. The computational resources available on these handheld devices is becoming relatively scarce as applications of increasing complexity and number are operated by the devices. Accordingly, there is growing interest in complexity-aware/power-aware video coding solutions.
Most of today's video coding systems encode video bit streams to achieve the best video quality (e.g., the minimal signal distortion) while satisfying certain bitrate constraints. Specifically the following optimization problem formulation is often adopted:
                                          min            P                    ⁢                                    D              ⁡                              (                P                )                                      ⁢                                                  ⁢                          s              .              t              .                                      ,                              R            ⁡                          (              P              )                                ≤                      R            T                                              (        1        )            where P represents the control variables (CV) which eventually determine the final video quality and bit rate. Typical CVs include quantization parameter (QP), motion vector, motion estimation block mode, etc. D is the distortion introduced by the encoding process. R is the bit rate of the encoded video and RT is the target bit rate. The solution of the above problem aims at identifying the optimal control variables for each coding unit in order to minimize the average distortion while satisfying the bit rate constraint. Though in practice, some design choices for the control variables may be made based on real-world resource limitations (e.g., memory and computational complexity), Equation (1) does not explicitly model this required complexity in video encoding or decoding. As a matter of fact, many recent advances in coding efficiency are accomplished by using increasingly complex computational modules.
Methods for reducing computational complexity in the prior art include ARMS and National Semiconductor develop a systematic approach called PowerWise technology, which can efficiently reduce the power consumption of mobile multimedia applications through adaptive voltage scaling (AVS). (See National's PowerWise™ technology, described at http://www.national.com/appinfo/power/powerwise.html, which is fully incorporated herein by reference). Zhou et al. implements an H.264 decoder based on Intel's single-instruction-multiple-data (SIMD) architecture that reduces the decoding complexity and improved the H.264 decoding speed by up to three times. (See X. Zhou, E. Li, and Y.-K. Chen, “Implementation of H.264 Decoder on General-Purpose Processors with Media Instructions”, in Proc. of SPIE Visual Communications and Image Processing, January 2003, which is fully incorporated herein by reference). Ray and Radha propose a method to reduce the decoding complexity by selectively replacing the I-B-P Group of Pictures (GOP) structure with one using I-P only. (See A. Ray and H. Radha, “Complexity-Distortion Analysis of H.264/JVT Decoder on Mobile Devices,” Picture Coding Symposium (PCS), December 2004, which is fully incorporated herein by reference). Lengwehasatit and Ortega developed a method to reduce the decoding complexity by optimizing the Inverse DCT implementation. (See K. Lengwehasatit and A. Ortega, “Rate Complexity Distortion Optimization for Quadtree-Based DCT Coding”, ICIP 2000, Vancouver, BC, Canada, September 2000, which is fully incorporated herein by reference). He et al. optimizes the power-rate-distortion performance by constraining the sum of absolute difference (SAD) operations during the motion estimation process at the encoder. (See Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, “Power-Rate-Distortion Analysis for Wireless Video Communication under Energy Constraints,” IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Integrated Multimedia Platforms, 2004, which is fully incorporated herein by reference). In addition, power-aware joint source channel coding is also an active topic for mobile wireless video communication. (See Y. Eisenberg, C. E. Luna, T. N. Pappas, R. Berry, A. K. Katsaggelos, Joint source coding and transmission power management for energy efficient wireless video communications, CirSysVideo(12), No. 6, June 2002, pp. 411-424; Q. Zhang, W. Zhu, Zu Ji, and Y. Zhang, “A Power-Optimized Joint Source Channel Coding for Scalable Video Streaming over Wireless Channel”, IEEE International Symposium on Circuits and Systems (ISCAS) 2001, May, 2001, Sydney, Australia; X. Lu, E. Erkip, Y. Wang and D. Goodman, “Power efficient multimedia communication over wireless channels”, IEEE Journal on Selected Areas on Communications, Special Issue on Recent Advances in Wireless Multimedia, Vol. 21, No. 10, pp. 1738-1751, December, 2003, all of which are fully incorporated herein by reference). Unlike the conventional paradigm using complex encoding and light decoding, Girod et al. propose a distributed video coding system which transfers the motion estimation process from the encoder to the decoder so that the encoding complexity can be greatly reduced. (See B. Girod, A. Aaron, S. Rane and D. Rebollo-Monedero, “Distributed video coding,” Proc. of the IEEE, Special Issue on Video Coding and Delivery, 2005, which is fully incorporated herein by reference).
Furthermore, the computational complexity of each component of a video decoding system varies. Some are relatively constant and independent of the encoded data while others heavily depend on the coding results. For example, the components of inverse quantization and inverse transform have nearly fixed computational cost per coding unit while the motion compensation component has variable complexity depending on the block mode and the type of motion vector. Furthermore, the decoder complexity is dominated by the interpolation filtering process used in motion compensation if the motion vectors are sub-pixel. Other parts of the decoding system, like entropy decoding and inverse transform, do not incur significant computational cost when compared to the interpolation process.
As noted, motion estimation is usually the most computationally complex process since it involves searching over a large range of possible reference locations, each of which may require interpolation filtering. Among the components in the decoding system, the interpolation procedure used in the motion compensation component consumes the most computational resources (about 50%) due to the use of sub-pixel motion vectors. Accordingly, one way to increase power consumption efficiency in video decoding would be to reduce the major computational cost of the motion compensation interpolation procedure.
Many fast motion estimation algorithms have been developed to reduce the motion estimation complexity during encoding. (See A. M. Tourapis. “Enhanced Predictive Zonal Search for Single and Multiple Frame Motion Estimation,” Proceedings of Visual Communications and Image Processing 2002 (VCIP-2002), San Jose, Calif., January 2002, pp. 1069-79; H.-Y. Cheong, A. M. Tourapis, “Fast Motion Estimation within the H.264 codec,” in proceedings of ICME-2003, Baltimore, Md., Jul. 6-9, 2003, both of which are incorporated herein by reference). Other work proposes scalable methods for motion estimation to control the coding complexity. (See M. Schaar, H. Radha, Adaptive motion-compensation fine-granular-scalability (AMC-FGS) for wireless video, IEEE Trans. on CSVT, vol. 12, no. 6, 360-371, 2002, which is incorporated herein by reference). Nevertheless these methods all focus on the encoding complexity reduction instead of the decoding complexity.
Accordingly, there exists a need in the art for an improved system and method for video encoding/decoding with improved motion estimation which reduces computational costs and power consumption in the decoder.