1. Field of the Invention
This invention relates to image compression, and more particularly to methods of adaptive or nonlinear compression optimized for the visual system.
2. Background of the Invention
The most common method of optimizing compression for the visual system is to transform the amplitudes of the image to a domain, which is perceptually uniform. Since the visual system""s gray scale behavior is approximately characterized by a cube-root front-end amplitude nonlinearity, the theory is to convert the image to an inverse domain, such as cubic, and then quantize. This technique forms part of nearly all video standards, with the exception that the power function of 3 is replaced by values around methods do this as a consequence of compressing images represented in the video standards. The advantage is using this approach is so substantial that it is used in almost every compression method.
The second most common method to visually optimize compression is by utilizing models of the visual system to weight the accuracy of the different spatial frequencies. This relies on the visual system""s varying sensitivity to 2D spatial frequencies. Various levels of visual models include the visual system""s low-pass characteristics at high spatial frequencies, its orientation dependence, and its bandpass nature at very low frequencies. The contrast sensitivity function (CSF) of the visual system describes the visual response to 2D spatial frequencies, and it is usually mapped to the compression transform domain, and then used to linearly quantize the transformed coefficients. This has been done for the discrete cosine transform (DCT), vector quantizers (VQ), and wavelet transforms.
As the visual angle of the displayed pixel gets smaller, such as by increasing the displayed resolution or by increasing the viewing distance, the performance of this technique increases, becoming quite substantial for photographic resolution applications. This technique does not provide as much advantage to lower resolution displays, such as NTSC or VGA resolutions, especially when viewed at working distances (usually 1.5 picture heights), as opposed to entertainment distances ( greater than 3 picture heights).
The third main area of visual optimization attempts to exploit the masking properties of the visual system, where visual sensitivity to distortions is reduced as the image content energy increases. This approach has advantages in that it can work in cases where the CSF does not provide much advantage. The most common of these cases is where the frequency sensitivity of the visual system does not change much over the digital frequencies present in an image. This corresponds to low resolution displays or close viewing distances. It also can help regulate bit-rate when entropy coders are used, or help keep a consistent image quality when rate control is used.
Early work in this area first tried scaling the overall quantization values as a function of local image variance usually with DCT blocks, but these have met with limited success because the DCT and block decompositions do not correspond well to the masking property of vision. Further, such adaptive methods require processing overhead to direct the decoder what quantizer was used to encode block. One method as disclosed in U.S. Pat. No. 4,774,574 combines notions from adaptive differential pulse code modulation with masking in the DCT domain to have an adaptive quantizer without any overhead. Unfortunately, the nature of the DCT and the zigzag coefficient ordering did not allow for accurate modeling of the masking effect. In spite of this, the use of visual masking to guide adaptive quantization results in bit-rate reductions of 5-25%, depending on the image content.
It is now known that the masking property of vision primarily occurs within spatial frequency channels that are each limited in radial frequency as well as orientation. The term channel refers to a collection of all mechanisms with the same spatial frequency. More recently, compression techniques that decompose an image into frequency bands analogous to the visual system frequency channels have been more amenable to use this vision property. The visual system is believed to decompose the image into localized mechanisms over spatial frequency and spatial location, and these mechanisms become less sensitive as their content is increased. This drop in sensitivity with the increase in content is what is referred to as the masking effect.
The masking effect makes it possible to quantize more coarsely as a function of the activity in that mechanism so that the visual masking effects are utilized by nonuniform quantization, as opposed to overtly adaptive techniques. Since these masking effects are approximately the same in each channel, once normalized, the same masking procedure could be used in each channel.
The Cortex transform decomposition set out in Efficiency of a Model Human Image Code, by Watson, (JOSA A V4, pp2401-2417), was designed to be as close as possible to the visual system""s spatial frequency channels such that the transform coefficients were approximately equivalent to the visual mechanisms. One could then quantize each coefficient in direct accordance with the known masking functions of the visual system, resulting in adaptive quantization behavior without incurring any overhead. This is because the decoder would be designed to contain the masking function, and only one is needed since it can be applied equally to any coefficient.
A final area of optimization is in using the visual system""s varying sensitivity to color distortions. However, most existing visual optimization strategies for color first extract the achromatic, or luminance, component from the color images.
In summary, then, a method is needed that performs compression with better visual optimization for a lower bit-rate than presently available. In addition, a need exists for the compression method to be less sensitive-to image content.
One embodiment of the invention is a method for image compression and decompression with high quality versus bit-rate. The method includes a compression process with the steps of spatial frequency decomposition, frequency band classification of the decomposed image, application of a sigmoid nonlinearity and uniform quantization. The data is then encoded and transmitted. The decompression process performs bit decoding and applies an inverse sigmoid nonlinearity. This data then undergoes a frequency band and spatial frequency recomposition to produce the full bandwidth image.
It is an advantage of the invention in that it improves the perceived image quality in applications where frequency-weighting techniques do not provide much gain, such as those with low display pixel resolution and those with close viewing distances.
It is a further advantage of the invention in that it makes the image quality less variable with changes in image content.
It is a further advantage of the invention in that it is adaptive, without adding processor overhead or image assessment computations.
It is a further advantage of the invention in that the sigmoid nonlinearity serves to distribute the quantization error more in accord with the thresholds of the visual system, with the net effect of achieving high quality with lower bit-rate.