In recent years, wavelet transform-based image compression methods have been widely applied to still image compression. Among these methods, the most representative methods include Embedded Zerotree Wavelet (EZW), Set Partitioning In Hierarchical Trees (SPIHT), and Embedded Block Coding with Optimal Truncation (EBCOT) which is the core of the new generation still image compression standard JPEG2000, where wavelet transform has become a necessary step in the still image compression standard JPEG2000.
However, the above methods still have some defects: when the compression ratio is high, if the wavelet-based compression method is used, obvious visual distortion of a recovered image easily occurs. The reason is that human eyes, as the final receiver of images, have different sensitivities to different types of image distortion, but currently the performance of the image compression method is evaluated using a Peak Signal-to-Noise Ratio (PSNR) value or Mean Square Error (MSE), without considering this important factor.
A block diagram illustrating the basic principle of a currently typical wavelet transform-based image compression method such as EBCOT or SPIHT is shown in FIG. 1, where the compression process includes four steps, namely, preprocessing, wavelet transform, quantization and coding. In the preprocessing stage, operations such as color space conversion and direct-current (DC) component removal are completed. Wavelet transform is used for reducing the spatial correlation of image data to compact energy into the low frequency part, so as to facilitate compression. The purpose of the quantization step is to reduce the entropy of the image data, so as to improve the overall compression ratio. Finally, the quantized wavelet coefficients are entropy coded to obtain a compressed code stream.
In the whole process of image compression and recovery shown in FIG. 1, the quantization step is the only operation causing distortion of a recovered image. In the image compression process, a good quantization scheme should be combined with human visual characteristics as far as possible, adopt different quantized values for different coefficients, discard unimportant information to which human eyes are insensitive, and maintain important information to which human eyes are sensitive, so as to ensure the quality of a recovered image while achieving compression. Existing researches indicate that human visual characteristics related to images mainly include luminance adaptation, masking effect and frequency characteristics; however, current image compression methods seldom consider human visual characteristics in the quantization stage, for example, neither SPIHT nor EZW considers human visual characteristics, and even if a few compression methods consider human visual characteristics, the consideration is not sufficient, which is mainly embodied in the following aspects.
(1) Only part of human visual characteristics are considered. For example, in the commonly used MEG 2000 standard, Fixed Visual Weighting (FVW) is to calculate a sub-band weight table which is not correlated to images but is correlated to the distance of sight (distance from human eyes to a display) according to a human visual model, directly multiply coefficients by corresponding weights after wavelet transform, and then perform regular coding. The method only uses band characteristics in human visual characteristics, and does not have an obvious effect in practical applications. Visual Progressive Coding (VIP) is to perform bit-plane scanning and coding according to a regular method, and use a weighted mean square error based on visual weights to replace a mean square error during optimized truncation, so as to minimize visual distortion in the compression process.
(2) A model obtained through visual experiments from a spatial domain image is not associated with wavelet coefficients, and thus cannot directly reflect the sensitivities of different wavelet coefficients to the human visual system. For example, the famous Dynamic Contrastive Quantization (DCQ) is to establish a contrast distortion model according to texture characteristics of an original image, and dynamically adjust the quantized value of each sub-band according to the overall contrast distortion iteratively until the visual distortion of the sub-bands are the same, so as to achieve an optimal visual effect. The algorithm cannot directly establish an association between wavelet coefficients and visual quantization values, and thus requires iterative computation, which increases the complexity and affects the practicability.
(3) Most visual models are established in unit of bands. The FVW, VIP and DCQ described above all define the visual quantization value in unit of bands, which are advantageous in that the visual quantization value has a small overhead in the code stream, but also have an obvious disadvantage of having a large granularity, failing to reflect detailed features of images.