As described by K. Hanke's description of “3D-videocodierung” at the website for the Institut für Nachrichtentechnik at Rheinisch-Wesfälische Technische Hochschule-Aachen, video encoding methods exploit specific signal properties for efficient encoding of a succession of images. In such cases spatial and temporal dependencies between the individual images or the pixels of these images are exploited. The better an image encoding or video encoding method is able to exploit these dependencies between the individual images or pixels, the greater in general is a compression factor which can be achieved.
A basic distinction is made in current methods for video encoding between hybrid encoding methods, such as the video coding standards ITU-T H.263 “Videocoding for Low Bitrate Communication”, February 1998 or ITU-T H.264 “Advanced Video Coding for Generic Audio Visual Services”, May 2003, for example, and so-called three-dimensional frequency encoding approaches. Although both methods attempt to encode the video signal, which consists of the succession of images, both spatially and also temporally, with hybrid encoding methods use is made initially of a movement-compensated prediction in the temporal direction and subsequently of a two-dimensional transformation of a difference image created, such as with the aid of a two-dimensional Discrete Cosine Transformation (DCT) for example, to enable a spatial correlation between adjacent pixels within the difference image to be removed.
With the three-dimensional frequency encoding approaches, such as the movement-compensated, temporally filtered partial band encoding for example, by contrast with the hybrid encoding methods, no temporal prediction but a “true” transformation in the direction of the time axis is performed, in order to thereby exploit the temporal correlation of consecutive images. With such partial band encoding the succession of images is encoded into a number of “temporal” frequency bands before the spatial two-dimensional decorrelation, such as with two frequency bands in a high and a low frequency band for the temporal high-frequency and low-frequency image components. In the fragmentation of the spectrum the distribution of the frequencies occurring in these frequency bands is heavily dependent on the size of the movement occurring in the video signal. Provided the observed video signal does not feature any moving or modified elements, all high-frequency “time spectrum components” are equal to zero and the total energy is concentrated on the partial frequency band. Normally however a change in an image over time will always be able to be seen in a succession of images, such as a local object displacement for example, a change of object size or a change of scene. This leads to a distribution of energy to a number of spectral coefficients, with high-frequency components also being produced.
To reduce the spectral components in the temporal high-frequency band and thus to concentrate the energy on the temporal low-frequency band, before the temporal filtering of the video signal into a number of “temporal” frequency bands, a movement estimation and a movement compensation of the images to be temporally filtered are undertaken.
According to H. Schwarz, D. Marpe and T. Wigand, Fraunhofer Institut für Telekommunikation, Heinrich Herz Institut, “Scalable Extension of H.264/AVC”, ISO/IEC JTC1/SC29/WG11, MPEG04/M10569/S03, March 2004, the movement-compensated, temporally-filtered partial band encoding can also be used for adjusting a scalable video data stream. For example a temporal, a qualitative or also a spatial scalability is enabled in this way. Furthermore a combined scaling is presented in Chapter 3.2.4 of Schwarz et al. In this case two different basic qualities (L0, L1) are obtained with the aid of the hybrid encoding method. To achieve improved image qualities additional scaled video data streams are included, such as L2, L3, L4 and/or L5 for example. These additional scaled video data streams (L2, . . . , L5) are created in Schwarz et al. with the aid of a movement-compensated, temporally filtered partial band encoding. Thus it is known that a scalable video data stream can be created with the aid of a first encoding method following movement-compensated, predictive encoding and a second encoding method following movement-compensated temporally filtered partial band encoding.