Described below are an apparatus and a method for encoding and decoding a video signal formed of video frames each including image blocks.
In video coding, similarities between images in a video sequence are exploited in order to reduce size and thereby a data rate of the compressed bit stream. For video encoding a temporal predictor from previously encoded frames is generated. This temporal predictor is subtracted from the current frame to provide a prediction error which is further encoded using for example a discrete cosine transformation (DCT). The generated transform coefficients are quantized by a quantization unit, and an entropy encoder performs entropy coding of the quantized coefficients.
FIG. 1 shows a block diagram of a known video encoding apparatus VENC using temporal prediction. A frame received at an input of the encoder is supplied to a motion estimation unit ME which supplies a motion vector MV to a motion compensation unit MC which calculates a temporal predictor TP supplied to a subtractor. The motion estimation unit ME and the motion compensation unit MC are connected on the input side to a frame buffer FB. This frame buffer FB can be a memory for storing a reference frame. The motion vector MV calculated by the motion estimation unit ME can also be supplied to a multiplexer MUX and transmitted via a communication channel to a decoder. Further, the known encoder as shown in FIG. 1 may be an inverse quantization unit Q−1 which reverses the indexing step of quantization and supplies a signal to an inverse transformation unit T−1, which forms an inverse transformation.
In current video coding standards like H.264/AVC, the motion vectors MV calculated by the motion estimation unit ME are developed for rectangular regions of the image, so-called image blocks. These image blocks can have different sizes, e.g. 4×4, 8×8, 16×16, 8×4 and 8×16 pixels. In order to calculate an accurate temporal predictor TP, a displacement with the accuracy of full pixels, i.e. the pixels to be encoded, is not sufficient, since a real motion cannot be captured accurately. Accordingly, a sub-pixel motion compensation, for example with an accuracy of half-pixel elements or half-pixels improve to model a translatory motion and thus to generate a more accurate temporal predictor TP. Accordingly, the sub-pixel motion compensation reduces the prediction error transformed by the transformation unit T and consequently the size and data rate of the encoded bit stream. Since pixel values at sub-pixel positions do not exist in the original video data stream, they are generated in a known encoding apparatus by interpolation. To calculate the pixel values at sub-pixel positions interpolation is performed by using filtering operations. Different filter lengths can be used such as a 2-tap bilinear filter or a 6-tap filter as defined in the H.264/AVC standard. These filters approximate an optimal interpolation process of the sin c function defined by sin c (x)=sin(x)/x. A provided virtual high-resolution image can then be used as a temporal prediction in the motion estimation and motion compensation process employed by the video encoder.
However, a known encoding and decoding process using interpolation has the drawback that high spatial frequencies not present in the original images having a frequency above Nyquist frequency cannot be regenerated this way.