Digital video compression is an important feature in many products. Many video coding standards (e.g., MPEG-1, MPEG-2, MPEG-4, H.263, H.264, etc.) provide performance requirements that video coding architectures implementing the digital video compression need to satisfy. For instance, the motion picture entertainment group (MPEG) standards are aimed at video images for digital television, and the H.264 standard published by the International Telecommunication Union (ITU) is aimed at video coding for low bit rates. However, the challenge is to satisfy the requirements of the various video coding standards in an architecture that provides high performance. Conventional video coding architectures have been unable to meet this challenge.
The data path to perform digital video compression for the various standards discussed above includes motion compensation. In essence, motion compensation exploits image redundancy between video frames to achieve high video-compression ratios. That is, a video sequence consists of a series of video frames. A previous video frame is selected as a reference frame. Current and subsequent frames can be predicted from the reference frame using motion compensation techniques, in part. That is, the movement of areas of previous frames are estimated and compensated for inclusion in a current frame.
In particular, motion compensation performs finite input response (FIR) filtering on a two-dimensional (2D) block of pixel data. FIR filtering is used to filter out additional noise of the input signal which can ultimately degrade the video output signal. The amount of FIR filtering required to perform video decompression in real time, requires multiple computation data paths.
Conventionally, a single instruction multiple data (SIMD) structure is used in the video coding architecture to execute multiple instances of the same operation, such as FIR filtering, in parallel using different data. That is, the SIMD structure can minimize the amount of control logic relative to the compute logic by using the same control for all instances of a replicated data path. The SIMD structure is therefore efficient due to the minimal overhead for each additional data path.
However, while the SIMD structure is efficient, the SIMD structure can be difficult to fully utilize because each data path must do the same computation in lock step. That is, conventional architectures are unable to present the appropriate data to the appropriate data path in order to perform motion compensation in an efficient manner. In particular, conventional video coding architectures have been unable to efficiently perform various performance requirements as specified by the video coding standards, such as, providing both vertical and horizontal filtering which implies that both rows and columns of data need to be presented to the replicated data paths, providing real-time edge replication on the border of reference frames, and providing the ability to swap X and Y coordinates for rotating the display.
For instance, the FIR filtering is required during motion compensation. A Multiply and Accumulate (MAC) is a data path element that can be used to compute an FIR filter 1 tap at a time. A common optimization is to fold the filter when the number of taps is even and the filter kernel is symmetric. This requires an additional adder, but allows 2 taps to be computed per clock. While this makes more efficient use of the MAC element, it can be difficult to access the appropriate data because it requires two unrelated reads.
As such, conventional architectures for performing motion compensation are unable to efficiently provide performance requirements as specified by the various video coding standards, such as providing both rows and columns of data, providing folding of the filter, providing edge replication of a reference frame, and providing the ability to perform X and Y swapping of coordinates.