The invention relates to video and/or image compression and, more particularly, to context-based perceptual quantization methods and apparatus in a video and/or image compression system.
There have recently been many efforts to develop compression schemes for images and video to provide a very good quality of compressed images/video. The schemes can be classified into three categories: (i) a block-based transform coding approach; (ii) a predictive coding approach based on spatial prediction; and (iii) a wavelet transform coding approach. The block-based transform coding approach has been described in technical literature such as, for example, in: Draft of MPEG-2: Test Model 5, ISO/IEC JTC1/SC29/WG11, April 1993; Draft of ITU-T Recommendation H.263, ITU-T SG XV, December 1995; and A. N. Netravali and B. G. Haskell, Digital Pictures: Representation, Compression, and Standards, 2nd Ed., Plenum Press, 1995, the disclosures of which are incorporated herein by reference. Further, the predictive coding approach based on spatial prediction has been described in technical literature such as, for example, in: Lossless and Near-lossless Coding of Continuous Tone Still Images (JPEG-LS), ISO/IEC JTC1/SC 29/WG Jul. 1, 1997; M. J. Weinberger, J. J. Rissanen, and R. B. Arps, xe2x80x9cApplications of Universal Context Modeling to Lossless Compression of Gray-scale Images,xe2x80x9d IEEE Trans. Image Processing, vol. 5, no. 4, pp.575-586, April 1996; and X. Wu and N. Memon, xe2x80x9cContext-based, Adaptive, Lossless Image Coding,xe2x80x9d IEEE Trans. Communications, vol. 45, no. 4, pp. 437-444, April 1997, the disclosures of which are incorporated herein by reference. Lastly, the wavelet transform coding approach has been described in the technical literature such as, for example, in: A. Said and W. A. Pearlman, xe2x80x9cA New, Fast, and Efficient Image Codec Based On Set Partitioning in Hierarchical Trees,xe2x80x9d IEEE Trans. Circuit and Systems for Video Technology, vol. 6, no. 3, pp.243-249, June 1996; and N. D. Memon and K. Sayood, xe2x80x9cLossless Compression of Video Sequences,xe2x80x9d IEEE Trans. Communications, vol. 44, no. 10, pp.1340-1345, October 1996, the disclosures of which are incorporated herein by reference.
In certain applications, the transmission bandwidth or the storage capacity is often limited so that distortion free transmission can not be achieved. Further, it is well known that the quantization step size selected by an encoder has a substantial effect on the resultant bit rate output by the encoder. Specifically, a large quantization step size performs coarse quantization, reducing the bit rate and the resulting video/image quality. On the other hand, a small quantization step size performs finer quantization, which leads to a higher bit rate and higher resulting video/image quality. Thus, in conventional encoders there is an attempt to find a quantization step size that is high enough to restrain the bit rate, while still achieving the best possible resulting video/image quality. In general, there is an attempt to maintain consistent video quality throughout a video sequence, rather than having the video quality vary widely from frame to frame. In many applications of image and video compression, the human observer is the final judge of the quality of the compressed images. In such situations, it is important to design compression algorithms that attempt to improve the subjective quality of the compressed images/video by exploiting the perceptual insensitivity characteristics of the human visual system or HVS. This can be accomplished by coarser quantization of samples in the area where the incurred distortion is less perceptible to the HVS. This approach, called xe2x80x9cperceptual quantization,xe2x80x9d has been adopted in many compression schemes. For example, perceptual quantization has been described in the technical literature such as, for example, in: A. Puri and R. Aravind, xe2x80x9cMotion-compensated Video Coding With Adaptive Perceptual Quantization,xe2x80x9d IEEE Trans. Circuit and Systems for Video Technology, vol. 1, no. 4, December 1991; N. Jayant, J. Johnston, and R. Safranek, xe2x80x9cSignal Compression Based On Models of Human Perception,xe2x80x9d Proc. of IEEE, vol. 10, October 1993; R. J. Safranek, xe2x80x9cA Comparison of the Coding Efficiency of Perceptual Models,xe2x80x9d Proc. SPIE, vol. 2411, pp.83-91, 1995; A. M. Eskicioglu and P. S. Fisher, xe2x80x9cImage Quality Measures and Their Performance,xe2x80x9d IEEE Trans. Communications, vol. 43, no. 12, pp.2959-2965, December 1995; and H. H. Y. Tong and A. N. Venetsanopoulos, xe2x80x9cA Perceptual Model For JPEG Applications Based On Block Classification, Texture Masking, and Luminance Masking,xe2x80x9d Proc. IEEE International Conference in Image Processing, Chicago, Ill., October 1998, the disclosures of which are incorporated herein by reference.
However, these prior art schemes require sending overhead information pertaining to quantization step size to the decoder since the samples used for selecting the step size are not available at the decoder. The overhead burden of sending overhead information pertaining to quantization step size to the decoder can be extremely heavy, particularly when quantizer selection is performed on the basis of a small block. Thus, it would be highly advantageous to have a perceptual quantization scheme which does not require quantization-related overhead information to be transmitted to a decoder.
The present invention provides for context-based perceptual quantization of an image or video sequence wherein a quantization step size value is generated for a current block of an image or video sequence based only on previously reconstructed samples associated with the image or video sequence. Advantageously, an encoder employing the methodologies of the invention is not required to transmit quantization-related overhead information to a decoder.
In one aspect of the invention, a method of perceptually quantizing a block of at least one image includes generating a non-perceptibility of distortion value. The non-perceptibility of distortion value is calculated from one or more masking values, e.g., complexity, brightness, movement, etc., which themselves are respectively calculated from previously reconstructed samples associated with the at least one image. The reconstructed samples may form one or more sets that are used to calculate such masking effects and, thus, the non-perceptibility of distortion value. In one embodiment, a set of samples is in the form of a template. A template having only previously reconstructed samples is referred to as a causal template. The perceptual quantization method then generates a quantization step size value as a function of the non-perceptibility of distortion value for use in quantizing the block of the at least one image. In this manner, coarser quantization is performed on the image or video sequence when the one or more masking values indicate that incurred distortion is less likely to be perceived by an observer.
Since generation of the quantization step size value at an encoder is accomplished according to the invention using sets or templates consisting only of previously reconstructed samples, it is to be appreciated that such sets or templates are also available at the decoder, i.e., by performing a similar quantization step size generation process at the decoder. As a result, an encoder of the invention does not need to provide quantization-related information to the corresponding decoder since the decoder can get the information using the same causal sets or templates used at the encoder. Advantageously, transmission bandwidth and/or storage capacity is saved.