In a multimedia embedded system, the video encoding module contains several major components including DCT (Discrete Cosine Transform)/IDCT (Inverse DCT), motion estimation (ME), motion compensation, quantization, inverse quantization, bit rate control and VLC (Variable Length Coding) encoding, where the most computationally expensive part is the motion estimation. Generally the motion estimation takes around 50% of the total computational power for an optimized system. Thus, to further optimize motion estimation is critical in cost reduction for real-time video encoding in an embedded multimedia system.
Many fast search algorithms have been developed including the three-step search, the 2-D logarithmic search, the conjugate directional search, the genetic search, the diamond search, the feature-based block motion estimation using integral projection, and sub-sampled motion field estimation with alternating pixel-decimation patterns. These various search approaches reduce the complexity at the expense of motion vector accuracy, which leads to a selection of only local minimum of mean absolute difference (MAD) as compared to global minimum of a conventional full search algorithm.
Conventional multi-resolution motion estimation techniques perform the search with a much smaller window from lower to higher resolution layers. The motion vectors are refined gradually at each layer but the search area is equivalent to that of the full search with much lower complexity. To further reduce the complexity, the conventional binary motion estimation algorithms significantly decrease both the computational complexity and bus bandwidth by reducing the bit depth. Based on a binary pyramid structure, Song, et. al., disclose a fast binary motion estimation algorithm, namely fast binary pyramid motion estimation (FBPME), in “New fast binary pyramid motion estimation for MPEG2 and HDTV encoding”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 10, no. 7, pp. 1015–1028, October 2000. The pyramidal structure of FBPME contains one integer layer at the lowest resolution (smallest picture size) and three binary layers that contain detail information. FBPME performs the tiled motion search with XOR (Exclusive OR) Boolean block matching criterion on binary layers and MAD on the integer layer. The block matching uses XOR operations that are much simpler and faster to implement than MAD operations.
However, the FBPME structure uses an integer layer, which leads to two distortion computation modules to perform both MAD and XOR operations. It requires bigger code size and more hardware complexity. The FBPME structure also needs more complicated pre-processing including filtering, decimation, binarization and interpolation. The hardware complexity for both MAD and XOR operations and more complicated pre-processing in the multi-layer approach result in more power consumption for hardware implementation.
Another conventional fast binary motion estimation algorithm presented by Natarajan, Bhaskaran, and Konstantinides is based on a simple one-bit transform with conventional search schemes. It provides single layer motion estimation that derives the current and reference blocks. However, the binary representation does not use any hierarchical structure. When a hierarchical structure is adopted, it is more challenging to get an accurate binary representation at a lower resolution.