Motion-compensated inter-frame prediction, a well known technique in video compression, exploits temporal redundancy between adjacent frames. Motion compensation is typically performed at block level, generally a macroblock (MB) of size 16×16 pixels. Advanced video standards support a range of block sizes and fine sub-pixel motion vectors. This provides better compression performance, but increases the complexity of the encoding and decoding processes.
For instance, H.264 video compression standard also known as MPEG-4 Part 10, “Advanced Video Coding” supports a range of block sizes from 4×4 blocks to 16×16 blocks and fine sub-pixel motion vectors, such as full pel, half pel and quarter pel. This standard uses a relatively complex 6-tap Finite Impulse Response (FIR) interpolation filter for sub-pixel interpolation to achieve better compensation performance.
Sub-pixel interpolation for a macro block takes significant processor cycles in video codecs. Most of the processor cycles are consumed by multiply and accumulate (MAC) operations required by FIR filters. It is comparatively faster to implement such an operation on a digital signal processor (DSP) or by using some specific hardware (H/W) block than on a general purpose microprocessor. The Texas Instruments TMS320C6400 family digital signal processor can achieve up to 8 MACs/cycle using software-pipelined SIMD (single instruction multiple data) instructions. This DSP can sustain such a performance level only by exploiting the inherent parallelism of its very long instruction word (VLIW) architecture. However, the parallelism provided by the next generation video codecs is limited severely if the video codec uses smaller block size in implementing the standard.