The invention relates to a method for video coding, a method for decoding as well as an encoder for video coding and a decoding device.
Digital video data is generally compressed for storage or transmission in order to significantly reduce the enormous data volume, compression being performed both by eliminating the signal redundancy contained in the video data and by removing the irrelevant signal portions not perceptible to the human eye. This is generally achieved by a hybrid encoding method whereby the image to be encoded is first time-predicted and the remaining prediction error is then transformed e.g. by a discrete cosine transformation to the frequency domain where it is quantized and encoded using a variable length code. The motion information and the quantized spectral coefficients are finally transmitted.
The better the prediction of the next image information to be transmitted, the smaller the prediction error remaining after prediction and the lower the data rate then required for encoding this error. An object of video data compression is therefore to obtain an optimally precise prediction of the image to be encoded from the image information already transmitted.
Image prediction has hitherto been performed by first subdividing the image e.g. into regular sections, typically square blocks of 8×8 or 16×16 pixels, and then determining a prediction for each of these blocks from the image information already known in the receiver by motion compensation. (However, blocks of different size can also be produced.) Such a procedure is illustrated in FIG. 1. A distinction can be drawn here between two basic prediction cases:                Unidirectional prediction: here motion compensation is performed solely on the basis of the previously transmitted image and results in so-called “P-frames”.        Bidirectional prediction: image prediction is performed by superimposing two images, one of which is a past image and the other a future image, resulting in so-called “B-frames”. It should be noted that the two reference images have already been transmitted.        
According to these two possible prediction cases, motion-compensated temporal filtering (MTCF) yields five directional modes in the MSRA method Jizheng Xu, Ruigin Xiong, Bo Feng, Gary Sullivan, MingChieh Lee, Feng Wu, Shipeng Li, “3D subband video coding using Barbell lifting”, ISO/IEC JTC1/SC29/WG11 MPEG 68th meeting, M10569/s05, Munich, March 2004, as illustrated in FIG. 2.
MCTF-based scalable video coding is used to ensure good video quality for a very wide range of possible bit rates. However, the currently known MCTF algorithms show unacceptable results for reduced bit rates due to the fact that too little texture (block information) is present in relation to information referring to the motion information (block structures and motion vectors) of a video defined by an image sequence.
A scalable form of motion information is therefore required in order to achieve an optimum ratio between texture and motion data at any bit rate and also resolution. To this end, Jizheng Xu, Ruigin Xiong, Bo Feng, Gary Sullivan, MingChieh Lee, Feng Wu, Shipeng Li, “3D subband video coding using Barbell lifting”, ISO/IEC JTC1/SC29/WG11 MPEG 68th meeting, M10569/s05, Munich, March 2004. discloses a solution from MSRA (Microsoft Research Asia) which represents the current state of MCTF algorithms.
The MSRA solution proposes to represent motion using layering, or resolve it in successively refined structures. The MSRA method succeeds in generally improving image quality at low bit rates.
However, this solution has the disadvantage that it results in a plurality of shifts in the reconstructed image due to a mismatch between motion information and texture.