The present invention relates to digital video signal processing, and more particularly to devices and methods for video coding.
There are multiple applications for digital video communication and storage, and multiple international standards for video coding have been and are continuing to be developed. Low bit rate communications, such as, video telephony and conferencing, led to the H.261 standard with bit rates as multiples of 64 kbps, and the MPEG-1 standard provides picture quality comparable to that of VHS videotape. Subsequently, H.263, H.264/AVC, MPEG-2, and MPEG-4 standards have been promulgated.
At the core of all of these standards is the hybrid video coding technique of block motion compensation (prediction) plus transform coding of prediction error. Block motion compensation is used to remove temporal redundancy between successive pictures (frames or fields) by prediction from prior pictures, whereas transform coding is used to remove spatial redundancy within each block of both temporal and spatial prediction errors. FIGS. 2a-2b illustrate H.264/AVC functions which include a deblocking filter within the motion compensation loop to limit artifacts created at block edges.
Traditional block motion compensation schemes basically assume that between successive pictures an object in a scene undergoes a displacement in the x- and y-directions and these displacements define the components of a motion vector. Thus an object in one picture can be predicted from the object in a prior picture by using the object's motion vector. Block motion compensation simply partitions a picture into blocks and treats each block as an object and then finds its motion vector which locates the most-similar block in a prior picture (motion estimation). This simple assumption works out in a satisfactory fashion in most cases in practice, and thus block motion compensation has become the most widely used technique for temporal redundancy removal in video coding standards. Further, periodically pictures coded without motion compensation are inserted to avoid error propagation; blocks encoded without motion compensation are called intra-coded, and blocks encoded with motion compensation are called inter-coded.
Block motion compensation methods typically decompose a picture into macroblocks where each macroblock contains four 8×8 luminance (Y) blocks plus two 8×8 chrominance (Cb and Cr or U and V) blocks, although other block sizes, such as 4×4, are also used in H.264/AVC. The residual (prediction error) block can then be encoded (i.e., block transformation, transform coefficient quantization, entropy encoding). The transform of a block converts the pixel values of a block from the spatial domain into a frequency domain for quantization; this takes advantage of decorrelation and energy compaction of transforms such as the two-dimensional discrete cosine transform (DCT) or an integer transform approximating a DCT. For example, in MPEG and H.263, 8×8 blocks of DCT-coefficients are quantized, scanned into a one-dimensional sequence, and coded by using variable length coding (VLC). H.264/AVC uses an integer approximation to a 4×4 DCT for each of sixteen 4×4 Y blocks and eight 4×4 chrominance blocks per macroblock. Thus an inter-coded block is encoded as motion vector(s) plus quantized transformed residual block. And each motion vector can be coded as a predicted motion vector (from prior macroblocks in the same frame or from the co-located macroblock of a prior frame) plus a differential motion vector, typically with variable-length coding.
The motion estimation puts a heavy load on embedded processor architecture as follows.                1) Computation requirements: The number of multiply/adds required to find the best block match may be large. This is usually characterized as the number of sums of absolute differences (SADs) required to find the best match. In the case of an exhaustive search, for each location in the search area, one SAD calculation is required. This requires tremendous computation power. For example, video with D1 (720×480 pixels) resolution at 30 frames per second (30 fps) needs 41.4 Million SAD calculations per second using exhaustive search.        2) Data bandwidth requirement: Typically, reference frames are stored in external memory. Motion estimation requires the loading of reference search areas from external memory to internal memory. This consists of reference region fetching as well as the current macroblock. For example D1 resolution at 30 fps video needs 50 Mbytes/second assuming a search region of 32×32 for a given macroblock.        