Spatial scalability of video signals can be achieved with critically sampled spatial wavelet schemes but also with overcomplete spatial representations. Critically sampled schemes struggle with the problem that critically sampled high-bands are shift-variant. Therefore, efficient motion compensation is challenging. On the other hand, overcomplete representations can be shift-invariant, thus permitting efficient motion compensation in the spatial sub-bands, but they have to be designed carefully to achieve high compression efficiency. This invention proposes an image processing method for decomposing two different spatial scales of the same image. The method is such that it minimizes the impact of the quantization noise on the reconstructed high-resolution video signal at the decoder.
Rate-distortion efficient coding of image sequences can be accomplished with motion-compensated temporal transforms as proposed in the U.S. Pat. No. 6,381,276 and the corresponding academic publication “Three-dimensional lifting schemes for motion compensated video compression”, in “Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Salt Lake City, Utah, May 2001, vol. 3, pp. 1793-1796. Employing the temporal transform directly to the images of the sequence may be too limiting for targeted scalability properties of video representations. In particular, desirable video coding schemes should provide efficient spatial scalability of the video signal. If a motion-compensated temporal transform is utilized, it is favorable to employ this transform to the spatial sub-bands of the input images. Such architectures achieve good spatial scalability but are burdened by degradation in rate-distortion performance. This burden is rooted in the fact that spatial decompositions utilize either critically sampled representations or overcomplete representations of the spatial sub-bands. Critically sampled representations lack the property of shift-invariance which seems to be crucial for efficient motion compensation. On the other hand, overcomplete representations can be shift-invariant, but rate-distortion efficient encoding is challenging.
This invention proposes a video coding scheme with spatial scalability properties that can be interpreted as an extension of the spatial scalability concept as it is known from, e.g., the video coding standard ITU-T Recommendation H.263: The pictures of the spatial base layer are spatially up-sampled in order to obtain pictures with the same spatial resolution as the pictures of the next spatial enhancement layer. These up-sampled pictures are used to predict the pictures of the next spatial enhancement layer. But this spatial prediction is just one step in our inter-resolution decomposition which requires also a spatial update step. The spatial update step will provide the desired property that spatial prediction is not capable of.
The invented multiresolution representation for images is related to the Laplacian pyramid as proposed in the academic publication by P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Transactions on Communications, vol. 31, no. 4, pp. 532-540, April 1983. The basic idea of the Laplacian pyramid is the following: First, a coarse approximation of the original image is derived by low-pass filtering and down-sampling. Based on this coarse version, the original is predicted by up-sampling and filtering, and the difference is calculated as the prediction error. For the reconstruction, the signal is obtained by simply adding back the difference to the prediction from the coarse signal.