1. Field of the Invention
The invention pertains to the field of video encoding and decoding. More particularly, the invention pertains to methods of video encoding and decoding employing motion compensation and devices adapted to execute such methods.
2. Description of the Related Technology
An MPEG-4 video decoder is a block-based algorithm exploiting temporal and spatial redundancy in subsequent frames. A bitstream, or sequence of bits representing the coded video sequences, is received as input, and the bitstream is compliant with the ISO/IEC 14496-2 standard. The bitstream starts with identifying the visual object as a video object. This video object can be coded in multiple layers (scalability). One layer consists of Visual Object Planes (VOPs), time instances of a visual object (i.e. frame).
A decompressed VOP is represented by a group of MacroBlocks (MBs). Each MB contains six blocks of 8×8 pixels: 4 luminance (Y), 1 chrominance red (Cr) and 1 chrominance blue (Cb) blocks. FIG. 1 illustrates the macroblock structure in 4:2:0 format (the chrominance components are downsampled in horizontal and vertical direction).
Two compression techniques are discriminated. In an intra case, the MB or VOP is coded on itself using an algorithm that reduces the spatial redundancy. In contrast, inter coding relates a macroblock of the current VOP to MBs to previously reconstructed VOPs and thereby reduces the temporal redundancy.
FIG. 2 is a block diagram of a simple profile video decoder, supporting rectangular intra coded (I) and predictive coded (P) VOPs. An I VOP (intra coded VOP) contains only independent texture information (only intra MBs). A P-VOP (predictive coded VOP) is coded using motion compensated prediction from the previous P or I VOP, which can contain intra or inter MBs.
Reconstructing a P VOP implies adding a motion compensated VOP and a texture decoded error VOP. In operation, the video decoder of FIG. 2 receives a bitstream, which is split into coded motion vector information and coded texture information by a demultiplexer. FIG. 2 illustrates performance of texture decoding of a complete VOP, motion compensation at VOP level, and reconstruction at VOP level by the decoder, each of which will be discussed in more detail hereinafter.
Note that all macroblocks must be intra refreshed periodically to avoid the accumulation of numerical errors. This intra refresh can be implemented asynchronously among macroblocks.
Motion Compensation
A video sequence typically has a high temporal correlation between similar locations in neighboring images (VOPs). Inter coding (or predictive coding) tracks the position of a macroblock from VOP to VOP to reduce the temporal redundancy. The motion estimation process tries to locate the corresponding macroblocks among VOPs. MPEG-4 only supports the translatory motion model.
The top left corner pixel coordinates (x, y) can be used to specify the location of a macroblock. The search for a matching block is restricted to a region around the original location of the MB in the current picture, maximally this search area consists of 9 MBs. In identifying (x+u, y+v) as the location of the best matching block in the reference, the motion vector equals to (u, v). In backward motion estimation, the reference VOP is situated in time before the current VOP, as opposed to forward motion estimation where the reference VOP comes later in time.
As the true VOP-to-VOP displacements are unrelated to the sampling grid, a prediction at a finer resolution can improve the compression. MPEG-4 allows motion vectors with half pixel accuracy, estimated through interpolation of the reference VOP. Such vectors are called half pel motion vectors.
Typically, a macroblock of a P VOP is only inter coded if an acceptable match in the reference VOP was found by the motion estimation (else, it is intra coded). Motion compensation uses the motion vector to locate the related macroblock in the previously reconstructed VOP. This motion vector information is exploited for retrieving information of a previously reconstructed VOP, assumed to be available at the decoder already. The difference between the related macroblock MB(x+u, y+v, t−1) and the current macroblock MB(x, y, t) is the prediction error e(x, y, t). The prediction error can be coded using the following texture algorithm:e(x,y,t)=MB(x,y,t)−MB(x+u,y+v,t−1)  (1)
Reconstructing an inter MB implies decoding of the motion vector, motion compensation, decoding the error, and adding the motion compensated and the error MB to obtain the reconstructed macroblock.
Texture Decoding Process
The texture decoding process (FIG. 2) is block-based and comprises four steps: Variable Length Decoding (VLD), inverse scan, inverse DC & AC prediction, inverse quantization and an Inverse Discrete Cosine Transform (IDCT).
The VLD algorithm extracts code words from Huffman tables, resulting in a 8×8 array of quantized DCT coefficients. Then, the inverse scan reorganizes the positions of those coefficients in the block. In case of an intra macroblock, inverse DC & AC prediction adds the prediction value of the surrounding blocks to the obtained value. This is followed by saturation in the range [−2048,2047]. Note that this saturation is unnecessary for an inter MB. Because no DC & AC prediction is used, the inter MB DCT coefficients are immediately in the correct range.
Inverse quantization, basically a scalar multiplication by the quantizer step size, yields the reconstructed DCT (Discrete Cosine Transform) coefficients. These coefficients are saturated in the range [−2bitsPerPixel+3,2bitsPerPixel+3−1]. In the final step, the IDCT transforms the coefficients to the spatial domain and outputs the reconstructed block. These values are saturated in the range [−2bitsPerPixel,2bitsPerPixel−1].
Thus, the decoded texture information comprises error texture information. The error texture information is added to the motion compensated previous VOP information, and the current VOP is thereby reconstructed.
Error Resilience
The use of variable length coding makes the (video) bitstreams particularly sensitive to channel errors. A loss of bits typically leads to an incorrect number of bits being VLC decoded and causes loss of synchronization. Moreover, the location where the error is detected is not the same as where the error occurs. Once an error occurs, all data until the next resynchronization point has to be discarded. The amount of lost data can be minimized through the use of error resilience tools: resynchronization markers, data partitioning, header extension and reversible variable length codes.
Optimization of video decoders using a MB based approach is discussed in the following references, each of which is hereby incorporated by reference in its entirety.    L. Nachtergaele, et al., “Low Power Data Transfer and Storage Exploration for H.263 Video Decoder System”, IEEE Journal on Selected areas in Communications, Special issue on Very Low Bit-Rate Video Coding Vol. 16, No. 1, pp. 120-129, January 1998.    L. Nachtergaele, et al., “System-Level power optimization of Video Codecs on Embedded Cores: a Systematic Approach”, Journal of VLSI Signal Processing, Kluwer, Vol. 18, No. 2, pp. 89-111, Boston, February 1998.    L. Nachtergaele, et al., “Power and speed-efficient code transformation of video compression algorithms for RISC processors”, to appear in Journal of VLSI Signal Processing, Kluwer Vol. 27, pp 161-169, Boston, February 2001.
It is the aim of the invention to provide a power consumption optimized video coder (encoder and decoder).