Compression of digital video data is performed for many applications, including but not limited to transmission over bandwidth constrained channels, such as satellite broadcasts, and storage on optical media. In order to achieve very efficient compression, complex, computationally intensive processes are used for encoding (compressing) and decoding (decompressing) video. For example, even though MPEG-2 (Motion Picture Expert Group) is known as a very efficient method for compressing video, a new, more efficient standard (i.e., H.264) is being developed.
Part of the encoding process involves so-called motion compensation. Based on a determined motion vector, an encoder fetches a block of data from an already transmitted reference frame, computes the difference between a to-be-encoded block and the block from the reference frame and compresses and transmits a difference. A decoder uses the same motion vector, fetches the same reference block, decompresses the difference information and adds the decompressed difference to the reference block.
Like other compression standards, the H.264 standard employs sub-pixel motion vectors. Both components (i.e., horizontal and vertical) of a motion vector are given in quarter-pixel units. When either component does not lie on the integer-pixel grid, the encoder interpolates the reference frame to find the values in-between the actual integer pixels, computes the difference between a to-be-encoded block and the interpolated block from the reference frame, and compresses and transmits the difference. The decoder performs the same interpolation before adding the reference block and the decompressed difference.
Referring to FIG. 1, an illustration of a conventional 8×8 block 20 within a frame of a video signal is shown. Pixels at integer positions within the block 20 (i.e., integer pixels) are represented by the letter I. Each pixel that lies on an integer position vertically but half way between two integer locations horizontally (i.e., H), is computed as a weighted sum of three integer pixels I in integer positions to the left and three integer pixels I in integer positions to the right. The pixels H are referred to as (1,½) pixels. Each pixel that lies on an integer position horizontally but half way between two integer locations vertically (i.e., V), is computed as a weighted sum of three integer pixels I in integer positions above and three integer pixels I in integer positions below. The pixels V are referred to as (½,1) pixels. Each pixel that lies half-way between integer pixels vertically and half-way between pixels horizontally (i.e., T) is computed as either (i) a weighted sum of three (1,½) pixels above and three (1,½) pixels below or (ii) a weighted sum of three (½,1) pixels to the left and three (½,1) pixels to the right. Computation of the pixels V, H and T transforms the block 20 into a 16×16 pixel grid having a half-pixel resolution.
Pixels on a quarter-pixel resolution grid (i.e., Q) having vertical and/or horizontal components that are not integer multiples of ½, are computed from the pixels I, H, V and T of the half-pixel resolution grid. A process for generating the pixels Q is fairly simple, involving a bi-linear interpolation process. In the bi-linear interpolation process, only the half-pixel grid neighbors are used to calculate the pixels Q. An exact approach for pixel Q generation depends on a position of the interpolated pixels Q relative to the integer pixels I. Details for quarter-pixel resolution interpolation can be found in the H.264 specification. A technique that the H.264 specification employs is to use long (i.e., 6-tap) filters for sub-pixel motion compensation. The sub-pixel interpolation process in accordance with H.264 can be very computationally intensive.
A conventional encoder commonly employs one of the following two techniques for sub-pixel interpolation. In a first technique, each reference frame of the video signal is interpolated to quarter-pixel resolution and stored in a memory. For motion compensation or motion estimation, the needed pixels I, H, V, T and Q are fetched from the memory. Therefore, motion compensation or motion estimation processes is computationally efficient because each sub-pixel position is computed only once. The first technique is conceptually simple and used in conventional software decoders. However, the first technique is not appropriate for a low-cost hardware decoder. The drawbacks of the first technique include (i) using a large amount of memory since each reference frame uses 16 times as much memory as is otherwise needed and (ii) a memory bandwidth used for motion estimation or motion compensation is greatly increased.
In the second technique, the integer pixels I are fetched from the memory and the interpolated pixels H, V, T and Q are computed when a block is needed for motion compensation. For motion estimation, the needed pixels are fetched and interpolation is performed “on-the-fly”. That is, for each motion vector considered (i) the interpolated pixels are computed and then (ii) an error score, such as sum of absolute differences, is computed between the interpolated block and the to-be-encoded block. For all motion vectors considered, the motion vector with a smallest “error” is selected. The second technique works well for a simple sub-pixel interpolation scheme. For example, MPEG-1 and MPEG-2 employ simple bi-linear sub-pixel interpolation, and only half-pixel, not quarter-pixel interpolation is used. Some conventional media processors employ the simple, serial, process of the second technique for computing error scores for sub-pixel motion vectors from the integer pixels I. In a single clock cycle, special purpose hardware is used to compute 64 sub-pixel positions and an error score between those interpolated values and another block of 64 pixels. For long sub-pixel filters, such as those used in H.264, the second technique is very inefficient. Computing 64 sub-pixel positions is much more complicated and time consuming then computing the error between the interpolated pixels and other pixels. Therefore, the second technique can be slow and much of the time the “error” hardware will be idle, waiting for the “interpolation” hardware to complete.