Multi-scale and multi-resolution representations of visual signals such as images and video are central for image processing and multimedia communications. They closely match the way that the human visual system processes information, and can easily capture salient features of signals at various resolutions. Moreover, multi-resolution algorithms offer computational advantages and usually have more robust performance. For example, as a scalable extension of video coding standard H.264/MPEG-4 AVC, the SVC standard has achieved a significant improvement in coding efficiency, as well as the degree of scalability relative to the scalable profiles of previous video coding standards. The basic structure for supporting the spatial scalability in this new standard is the well-known Laplacian Pyramid.
The Laplacian Pyramid (hereinafter “LP”), also called Laplace Pyramid in the current literature, and introduced by P. J. Burt and E. H. Adelson in 1983, is a fundamental tool in image/video processing and communication. It is intimately connected with resampling such that every pair of up sampling and down sampling filters corresponds to an LP, by computing the detail difference signal at each step. Vice versa, by throwing away the detail signal, up- and down-sampling filters result. Traditionally, LPs have been focused on resamplings of a factor of 2, but the construction can be generalized to other ratios. In the most general setting, non-linear operators can be employed to compute the coarse approximation as well as the detail signals. The LP is one of the earliest multi-resolution signal decomposition schemes. It achieves the multi-scale representation of a signal as a coarse signal at lower resolution together with several detailed signals as successive higher resolution.
This is demonstrated in FIG. 1 where H(z) 14 is often called the Decimation Filter and G(z) 16 is often referred to as the Interpolation Filter. Such a representation is implicitly using over-sampling. Hence, in compression applications, it is normally replaced by sub-band coding or the wavelet transform, which are all critically-sampled decomposition schemes.
The LP is the foundation for spatial scalability in numerous video coding standards, such as MPEG-2, MPEG-4, and the recent H.264 Scalable Video Coding (SVC) standard propounded in the September 2007 article entitled “Overview of the scalable extension of the H.264/MPEG-4 AVC video coding standard”, by H. Schwarz, D. Marpe, and T. Wiegand. The LP provides an over-complete representation of visual signals, which can capture salient features of signals at various resolutions. It is an implicitly over-sampling system, and can be characterized as an over-sampled filter bank (hereinafter “FB”) or frame. As the inverse of an over-sampled analysis FB, beside the conventional reconstruction scheme depicted in FIG. 2, the LP reconstruction actually has an infinite number of realizations that can satisfy the perfect reconstruction (hereinafter “PR”) property. Despite the sampling redundancy, the LP still has its occasional advantages over the critically sampled wavelet scheme. In the LP, each pyramid level only down-samples the low-pass channel and generates one band-pass signal. Thus, the resulting signal does not suffer from the “scrambled” frequencies, which normally exist in critical sampling scheme because the high-pass channel is folded back into the low frequency after sampling. Therefore, the LP enables further decomposition to be employed on its band-pass signals, generating some state-of-the-arts multi-resolution image processing and analysis tools.
The LP decomposition framework provides a redundant representation and thus has multiple reconstruction methods. Given an LP representation, the original signal usually can be reconstructed simply by iteratively interpolating the coarse signal and adding the detail signals successively up to the final resolution. However, when the LP coefficients are corrupted with noise, such reconstruction method can be shown to be suboptimal from a filter bank point of view. Treating the LP as a frame expansion, M. N. Do and M. Vetterli proposed in 2003 a frame-based pyramid reconstruction scheme, which has less error than the usual reconstruction method. They presented from frame theory a complete parameterization of all synthesis FBs that can yield PR for a given LP decomposition with a decimation factor M. Such a general LP reconstruction has M2+M free parameters. Moreover, they revealed that the traditional LP reconstruction is suboptimal, and proposed an efficient frame-based LP reconstruction scheme. However, such frame reconstruction approaches require the approximation filter and interpolation filter to be biorthogonal in order to achieve perfect reconstruction. Since a biorthogonal filter can cause significant aliasing in the down-sampled lowpass subband, it may not be advisable for spatially scalable video coding.
To keep the same reconstruction scheme but overcome the bi-orthogonality limitation in the frame-based pyramid reconstruction, a method called lifted pyramid was presented by M. Flierl and P. Vandergheynst in 2005 to improve scalable video coding efficiency. Therein, the lifting steps are introduced into pyramid decomposition and any filters can be applied to have perfect reconstruction. The lifted pyramid introduced an additional lifting step into the LP decomposition so that the perfect reconstruction condition can be satisfied. where the lifting steps are introduced into pyramid decomposition and any filters can be applied to have perfect reconstruction. When compared to the conventional LP, however, the low-solution representation of the lifted pyramid has more significant high-frequency components and requires larger bit rate because of the spatial update step in the decomposition. Thus, it is undesirable in the context of scalable video compression.
A similar modified LP scheme called Laplacian Pyramid with Update (hereinafter “LPU”) was presented by D. Santa-Cruz, J. Reichel, and F. Ziliani in 2005 to improve scalable coding efficiency. However, the LPU still needs to change the low-pass subband LP coefficients due to the spatial update step in the decomposition procedure. Hence, it has the same problem as the aforementioned lifted pyramid method. The present invention solves the long felt needs of the prior art attempts and presents novel methods that offer a variety of unanticipated benefits.
Accordingly, it is desirable to provide advanced methods for resampling and reconstruction within the pyramid representation framework for digital signals. Such signals may be contaminated by noise, either from quantization as in compression applications, from transmission errors as in communications applications, or from display-resolution limit adaptation as in multi-rate signal conversion. The methods of the present invention offer enhanced reconstruction.