Block-based video compression standards such as H.261, H.263, MPEG1, MPEG2, and MPEG4 achieve efficient compression by reducing both temporal redundancies between video frames and spatial redundancies within a video frame. An intra-coded frame is self-contained and only reduces spatial redundancies within a video frame. Inter-coded frames, however, are predicted via motion compensation from previously coded frames to reduce temporal redundancies. The difference between the inter-coded video frame and its corresponding prediction is coded to reduce spatial redundancies. This methodology achieves high compression efficiency. Each video frame comprises an array of pixels. A macroblock (MB) is a group of pixels, such a 16×16 block. In the simplest approach, the difference between a macroblock in the current video frame and the corresponding block in the previous video frame would be encoded. This is inefficient because of camera motion and object motion. Instead, it is common to estimate how the image has moved between the frames. This process is called motion estimation. Since different parts of the image may move in different directions (e.g. if the camera is rotated), the motion estimation is performed for each macroblock in the current video frame. The task of Motion Estimation usually comprises comparing a macroblock in the current frame to a number of macroblocks from the previous frame and finding the one that is most similar. The spatial shift between the macroblock in the current video frame and the most similar macroblock in the previous video frame is denoted by a motion vector. The previous macroblocks are not just searched on macroblock boundaries. The motion vector may be estimated to within a fraction of a pixel, by interpolating pixel values from the previous video frame.
The task of Motion Estimation (ME) is the most computationally intensive in a video compression system and may account for as much as 80% of the complexity in current schemes. For real-time video coding, the ME unit may be required to perform billions of operations per second and requires a large memory bandwidth. Prior video systems have utilized hardwired Application Specific Integrated Circuit (ASIC) implementations. These meet the performance requirements of a video CODEC. However, the are only able to implement a limited set of algorithms. They lack the flexibility of a general purpose processor core, such a RISC core or a DSP core, and cannot be modified to execute other algorithms without major redesign. Other the other hand, general purpose processor cores, such RISC or DSP cores, are not well suited to applications mobile applications, such as wireless videoconferencing, digital video cameras, or 3G cellular devices, where low power consumption is required. Their general-purpose nature makes them inefficient compared to an ASIC, and more hardware resources are needed to achieve the same performance. An example is the TMS320C64x series of DSPs manufactured by Texas Instruments.
U.S. Pat. Nos. 5,594,813 and 5,901,248 describe the combination of a RISC controller with a scalar data processing path for video processing. No instruction set architecture is defined so the device does not have the capability to execute general-purpose control code. Further, a single arithmetic logic unit is used so a very high clock rate is needed for real-time video processing. In contrast, some ASIC devices, such as that described in “A family of VLSI designs for motion compensation block-matching algorithm”, IEEE transactions on Circuits and Systems, Vol. 36, No. 10, October 1989, by Kun-Min Yang et al, use multiple processing elements to perform a number of operations in parallel, thus reducing the need for a high clock rate. However, ASICs, such the Sti3220 Motion Estimation Processor Codec from SGS Thomson Microelectronics, lack the flexibility to implement a variety of motion estimation algorithms.
A programmable chip incorporating a DSP, a 32b RISC processor and several motion estimation (ME) coprocessors is described in “A Summary of A336™/8/E Parallel Video DSP Chip” published by Oxford Micro Devices, Inc. The ME coprocessor of this device is accessible only through a single ‘PixDist’ instruction and requires both instructions and data to be issued to perform a computation. Its functionality is limited to the calculation of sum of absolute difference calculations from various memory locations, and so the device has limited flexibility.
There is therefore an unfilled need for a motion estimation apparatus that is flexible and has low power consumption.