Embodiments of the present invention relate to a hybrid video decoder. Further embodiments of the present invention relate to a hybrid video encoder, a data stream, a method for decoding a video and a method for encoding a video.
In the conventional or state-of-the-art (hybrid) video coding, the components of a video frame are predicted either by motion compensated prediction, using the reconstructed components of previous pictures, or by intra prediction, using previously reconstructed blocks of the same picture (see, for example, Thomas Wiegand, Gary J. Sullivan, Gisle Bjontegaard, and Ajay Luthra Overview of the H.264/AVC Video Coding Standard, IEEE Tran. on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 560-576, July 2003). For example, if a video frame is represented using YUV color space, the components of the video frames are the three Y, U and V color components. The residual signal, i.e. the difference between the original components and the corresponding prediction signals, is usually coded using transform coding (a combination of a decorrelating transform, quantization of transform coefficients, and entropy coding of the resulting quantization symbols). When the picture is comprised of multiple components (planes), the prediction can either be done separately or can be grouped by sharing the prediction information (plane grouping). Motion compensated prediction can be done for some sub-regions of a picture (see, for example, Thomas Wiegand, Markus Flierl, and Bernd Girod: Entropy-Constrained Design of Quadtree Video Coding Schemes, Proc. 6th IEE Intern. Conf. on Image Processing and its Applications, Dublin, Ireland, July 1997.2). Usually, the sub-regions are rectangular blocks of samples. But it is also conceptually possible to use the same motion parameters for an arbitrary set of samples. The motion parameters are included in the bitstream and transmitted to the decoder. It is possible to use arbitrary motion models. Commonly, the motion is modeled using a translational motion model, in which case a motion vector (2 parameters) specifying a displacement is transmitted for each region. Other common motion models include the affine motion model (6 parameters), 3-, 4-, and 8-parameter models. The motion parameters can be transmitted with arbitrary accuracy. For example, for the translational motion model, the motion vectors could be coded using full-sample accuracy or sub-sample accuracy (e.g. quarter-sample accuracy). In the first case, the prediction samples can be directly copied from the reconstructed pictures. In the case of sub-sample accurate motion vectors (or general motion parameters), the prediction samples are interpolated using the reconstructed samples. The state-of-the-art sub-sample generation methods for motion compensated prediction use FIR filtering. Recently, adaptive FIR filters (see, for example, Thomas Wedi, Adaptive Interpolation Filter for Motion Compensated Hybrid Video Coding, Proc. Picture Coding Symposium (PSC 2001), Seoul, Korea, April 2001) were proposed for improved motion compensated prediction. Any of the previously transmitted pictures can be used for motion compensation (see, for example, Thomas Wiegand, Xiaozheng Zhang, and Bernd Girod Long-Term Memory Motion-Compensated Prediction, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 1, pp. 70-84, February 1999). If the reference picture is not fixed by high-level parameters, reference indices can be transmitted to identify the used reference pictures. It is also possible to modify the prediction signal using a weighting factor and an offset (often referred to as weighted prediction), or any other weighting function to obtain the final prediction signal. Furthermore, several prediction signals can be combined to obtain the final prediction signal. This is often referred to as multi-hypothesis prediction (see, for example, Sullivan, G.; Multi-hypothesis motion compensation for low bit-rate video coding, IEEE International Conference on Acoustics, Speech and Signal Proc. Vol. 5, 1993). The combined prediction signal can, for example, be obtained by a weighted sum of different prediction signals. The individual prediction signals can stem from same or different upsampled reference pictures. If two prediction signals are combined, the multi-hypotheses prediction is also referred to as bi-prediction (as supported in B-slices of modern video coding standards). It is, however, also possible to use more than two hypotheses. The entropy coding of the quantized transform coefficients can be done, for example, by variable-length coding or (adaptive) arithmetic coding (see, for example, Detlev Marpe, Heiko Schwarz, and Thomas Wiegand: Context-Based Adaptive Binary Arithmetic Coding in the H264/AVC Video Compression Standard, IEEE Tran. on Circuits and Systems for Video Technology, Vol. 13, No. 7, pp. 620-636, July 2003).
However, in hybrid video coding, there is a desire to improve a compression efficiency of the transmitted information.