This invention relates to method for coding images, more particularly to methods for image coding that use visual optimization techniques of self-contrast and neighborhood masking.
Image compression involves coding image information in such a manner that the amount of data required to reconstruct the image is compressed. When the image information is transmitted, not as much bandwidth is required to transmit the image when it is compressed. The compression of images is typically referred to as image coding. The reconstruction is typically referred to as decoding.
Image compression has as one of its goals the removal of statistical redundancy in the image data. Redundancy leads to increased bandwidth. Compression techniques try to minimize the distortion of the image within a given transmission bit rate, and minimize the bit rate when given an allowable distortion target.
Another goal of image compression focuses on removing perceptual irrelevancy. Aspects of the image that cannot be detected by the human visual system are irrelevant. Therefore, it wastes resources and bandwidth to compress in such a manner to include these aspects. Compression schemes should take into account properties of the human visual system in the process of optimizing the coding.
One common visual optimization strategy for compression makes use of the contrast sensitivity function of the visual system. Human eyes are less sensitive to high frequency errors, meaning that high frequency components of images can be more coarsely quantized. DCT and wavelet based compression systems use this strategy widely, as demonstrated in U.S. Pat. No. 5,629,780, issued May 13, 1997; S. Daly, Application of a Noise-Adaptive Contrast Sensitivity Function to Image Data Compression, Optical Engineering, vol. 29, pp. 977-987, 1990; Watson, et al., Visibility of Wavelet Quantization Noise, IEEE Transactions on Image Processing, vol. 6, no. 8, pp. 1164-1175, 1997.
The advantages of this technique become less noticeable for lower is resolution display and closer viewing distance. The contrast sensitivity function curve (CSF) tends to be flat in those conditions, not allowing the high frequency content to be more coarsely quantized without affecting the perception.
Another perceptual phenomenon occurs in an effect called visual masking. Images acting as background signals mask artifacts locally. For example, in the wavelet transform domain a larger coefficient can tolerate a larger distortion than smaller coefficients. This occurs because the large coefficient has a larger background signal that masks the visual distortion.
U.S. Pat. No. 5,136,377, issued Aug. 4, 1992, and U.S. Pat. No. 4,725,885, issued Feb. 16, 1988, show early work with this phenomenon. These attempts basically tried to scale the overall quantization values as a function of local image variance. These methods require processing overhead to notify the decoder what quantizer encoded a local block.
One example of these techniques, found in U.S. Pat. No. 4,774,574, issued 1988, scales the individual coefficients in a zigzag scan of a DCT block as a function of the preceding coefficients. This avoids the overhead for specifying the quantizer. It exploits the coefficient masking effects where the low frequency components mask high frequency components. In wavelet applications, the coefficient masking effects result in intra-band masking, these xe2x80x98bandsxe2x80x99 in DCT applications are the coefficients which have a narrow bandwidth. However, this approach has a potential problem in that the nature of the DCT and the zigzag effect do not allow accurate modeling of the masking effect.
It is now understood that the masking property of human vision primarily occurs within spatial frequency channels that are limited in radial frequency as well as orientation. This makes it possible to quantize more coarsely as a function of the activity in spatial frequency and spatial location. Nonuniform quantization can then utilize the visual masking effects instead of overtly adaptive techniques.
An advantage occurs in this approach because the masking effects are approximately the same in each channel. Once normalized, the same masking procedure can be used in each channel without incurring any overhead. An example of this technique can be found in U.S. patent application Ser. No. 09/218,937, filed Dec. 22, 1998 and co-owned by the assignee of the present invention.
One method to exploit this masking effect, hereinafter referred to as self-contrast masking effect, for image compression puts the CSF-normalized transform coefficients through a nonlinear transducer function before a uniform quantization is applied. This results in a non-uniform quantization of the original coefficients. The decoder applies the inverse process between dequantization and inverse wavelet transform. Another example of this type of technique is shown in U.S. Pat. No. 5,313,298, issued May 17, 1994, although it uses the spatial domain rather than the frequency domain.
Another method of exploiting visual masking controls individual code-block contribution. This was proposed in the JPEG2000 context in High Performance Scalable Image Compression with EBCOT, by David Taubman, submitted to IEEE Transactions on Image Processing, March 1999, hereinafter referred to as Taubman. This approach takes advantage of the existing JPEG2000 verification model. The approach divides the coefficients in each wavelet subband into blocks of the same size, called code-blocks. Each code-block is embedded coded independently. The embedded coding of each individual code-block does not take into account the visual masking effect.
However, in the post-compression rate-distortion optimization process of Taubman, the distortion metric takes into account the visual masking effect. In this step of the process sub-bitstreams from each code-block are assembled in a rate-distortion-optimized order to form the final bitstream. The modified metric effectively controls the bit allocation among different code-blocks, taking advantage of the visual masking effect.
The distortion of each coefficient is weighted by a visual masking factor that is generally a function of the neighboring coefficients in the same subband. This will be referred to as spatially extensive masking or neighborhood masking. It treats each coefficient value, Vi, as though it were equal to Vixe2x80x2, where
Vixe2x80x2=Vi/Mi
and the masking strength function is
Mi=A*xcexa3{k near i}sqrt(|Vk|)
with A being the normalization factor.
The weakness of this approach is that it only adjusts the truncation point of each code-block. This is a spatially coarser adjustment than the sample-by-sample compensation offered by the approach discussed in the U.S. patent application Ser. No. 09/218,937, mentioned previously. The bit stream order within each code-block, usually no less than 32xc3x9732, does not take into account any visual masking effect.
In the article APIC: Adaptive Perceptual Image Coding Based on Subband Decomposition with Locally Adaptive Perceptual Weighting, published in Proceedings of the IEEE International Conference on Image Processing, pp. 37-40, 1997, Hontsch, et al., discuss a further technique exploiting visual masking. The algorithm locally adapts the quantizer step size at each pixel according to an estimate of the masking measure presented. The estimate of the masking measure comes from the already coded pixels and predictions of the not yet coded pixels. It eliminates the overhead by exploiting the self-contrast masking based on estimates of the current pixel from the neighboring pixels already coded.
However, this estimate may not be accurate, as the coefficients are de-correlated. It also does not take advantage of spatially extensive, or neighborhood, masking.
Therefore, there is a need for a coding method that takes into account both the self-contrast masking and the neighborhood masking effects. It must take these effects into account without significantly increasing the overhead of the encoder or decoder.
One aspect of the invention is a method for compressing and decompressing image information. The method includes the steps of receiving initial image information at an encoder, and transforming the initial information using a linear transform to produce coefficients. These coefficients are then locally normalized with a neighborhood-masking factor, and then quantized and coded to produce a compressed bitstream. The compressed bitstream is decoded at a decoder using an inverse process.
An alternative embodiment applies the neighborhood masking weighting factor during encoding, after quantization, and uses self-masking-compensated coefficient prior to quanitzation. Either one of these embodiments can be combined with the contrast sensitivity function and the local luminance sensitivity of the human visual system.