In the embedded coding, the coding bitstream may be truncated and may be used in a variety of bit rate ranges. The viewing condition, or appearance, of the high bit rate will be substantially different from that of a low bit rate. Visual Progressive Coding (VPC) provides a mechanism and a method to adjust the viewing condition across the entire coding bit rate so that a better subjective image may be obtained over all the bit rate range.
Visual weighting has proven itself to be an effective tool to improve the subjective quality of an encoded image. By allocating more bits to coefficients in the visual sensitive frequency band and less bits to coefficients in the visual insensitive band, visual weighting emphasizes those features which are more perceivable by the human eyes, and improves the subjective quality of the image. Traditionally, visual weighting may be implemented in one of two ways: by multiplying/dividing the transform coefficients with a model of the contrast sensitivity function (CSF) of the visual system: EQU .function..sub.ij =.function..sub.ij.multidot.W.sub.ij (1)
and then quantizing and entropy encoding the weighted coefficient .function..sub.ij or by adjusting the quantization step size to the inverse of the CSF function: ##EQU1##
(1) and (2) are known as the fixed visual weighting scheme, where .function..sub.ij and .function..sub.ij are the transform coefficient, without and with, respectively, visual weighting, x.sub.ij is the quantized coefficient, i indexes the frequency band, and j is a position within the band i. q is the quantization step size associated with the band i, and is adjusted to be inversely proportional to the weight, Q is a quantizer. w.sub.i is a weighting factor associated with the frequency component of coefficient x.sub.i and the viewing condition. The weight w.sub.i may be derived from a contrast sensitivity function (CSF) model of the visual system and the distance the image is to be viewed. In many embedding schemes, there is no quantization operation, in such a case implementation (1) may be used. It is usually assumed that the visual weighting factor w.sub.i is fixed during the entire coding process. Such schemes are known as "fixed visual weighting". For schemes explicitly involving a quantization operation, such as JPEG, operation (2) is simpler, and is widely adopted. Because the implementation of fixed visual weighting is rather simple, most of the existing research on visual optimized coding focuses on the derivation of the weighting factor, w.sub.i, from the viewing distance, as disclosed in the references cited herein.
To summarize, coding may be implemented as a two step operation: (A) transform and entropy coding; or as a three step operation: (B) transform, quantization and entropy coding. Method A is used for many embedded coders. A separate implementation of fixed visual weighting is required for the two types of coding: for method A, implementation (1) is used and for method B, implementation (2) is used.
One of the recent achievements in image coding is embedded coding. An embedded coder, such as the Embedded Zero tree Wavelet coding (EZW), J. Shapiro, "Embedded image coding using zero tree of wavelet coefficients", IEEE Trans. On Signal Processing, vol. 41, pp.3445-3462, December 1993, has the ability to generate a coding bitstream which may be truncated in a subsequent processing step and which may still be decoded to reveal a visually perceptible image. The embedded coder has important applications in internet image browsing, image database, digital camera, etc.
Using internet image browsing as an example, with the embedded coding, only one version of the compressed image need be stored in a central database. A user may first request only a small portion of the bitstream for each image, so that the user may quickly browse through a large number of images at low fidelity. When the image of interest is found, the user may then request the remainder of the bitstream and bring the image to full resolution and fidelity. The EZW technique encodes the image bitplane-by-bitplane, and within each bitplane, it uses a zerotree structure to group the insignificant coefficients and to efficiently encode them.
There have been a number of other publications and patents in the area of embedded coding. One of the well known references in the field include the layered zero coding (LZC) proposed by D. Taubman and A. Zakhor, "Multirate 3-D subband coding of video", IEEE Trans. On Image Processing, Vol. 3, No. 5, September 1994, pp.572-588. An embedded coding approach, called Layered Zero Coding (LZC) is described. The scheme encodes the transformed coefficient bitplane-by-bitplane with context adaptive arithmetic coding. It achieves better rate-distortion performance than EZW, however, no human visual characteristic is considered in the paper. In addition to a superior performance, the coding bitstream generated by LZC may be organized into progressive-by-quality or progressive-by-resolution, which provides additional flexibility for the embedding process.
Set Partitioning In Hierarchical Trees (SPIHT) is proposed by A. Said, and W. Pearlman, in "A new, fast and efficient image codec based on set partitioning in hierarchical trees", IEEE Trans. On Circuit and System for Video Technology, Vol. 6, No. 3, June 1996, pp. 243-250. SPIHT redefines the grouping of insignificant coefficient and achieves a superior performance compared with the EZW. Moreover, one mode of SPIHT eliminates the entropy coder, which makes the encoder and decoder very simple. Again, no human visual characteristic is considered.
H. Wang and C. J. Kuo, "A multi-threshold wavelet coder (MTWC) for high fidelity image", IEEE International Conference on Image Processing '1997, discloses a scheme which provides an improvement over the LZC by first encoding the wavelet coefficients with the largest threshold value. No human visual characteristics is considered in the scheme.
J. Li and S. Lei, "An embedded still image coder with rate-distortion optimization", SPIE: Visual Communication and Image Processing, volume 3309, pp. 36-47, San Jose, Calif., January 1998 discloses a scheme which optimizes the performance of the embedded coder by first encoding the coding units with the largest rate-distortion slope, i.e., the largest distortion decrease per coding bit spent. A rate-distortion optimized embedding coder (RDE) is disclosed, which provides a smooth rate-distortion curve and improves the performance of SPIHT and LZC. Still, the human visual system is not considered in the scheme.
Jones, Daly, Gaborski and Rabbani, Comparative study of wavelet and DCT decompositions with equivalent quantization and encoding strategies fro medical images, SPIE V. 2431, Proceedings of Conference Medical Imaging, pp. 571-582, 1995, disclosed techniques calculating visual weights.
U.S. Pat. No. 5,426,512, to A. Watson, for "Image data compression having minimum perceptual error", describes a method which adapts or customizes the DCT quantization matrix according to the image being compressed. The method may only be used for fixed rate coding.
U.S. Pat. No. 5,629,780, to A. Watson, for "Image data compression having minimum perceptual error", describes a method wherein the quantization matrix is adjusted with the visual masking by luminance and contrast techniques and by an error pooling technique. It is used for compressing an image at a fixed visual condition.
U.S. Pat. No. 4,780,761, to S. Daly et al., for "Digital image compression and transmission system visually weighted transform coefficients", discloses a system to quantitize the transform coefficients according to a two-dimensional model of the sensitivity of the human visual system. The model of the human visual system is characterized as being less sensitive to diagonally oriented spatial frequencies than to horizontally or vertically oriented spatial frequencies, thereby achieving increased compression of the image. It is again for use in a fixed viewing condition.
U.S. Pat. No. 5,144,688, to A. Bovir, et al., for "Method and apparatus for visual pattern image coding", describes a sub-band compression system. The image is separated into a plurality of sub-bands. A perceptual matrix is determined based on the properties of the sub-band filters, quantizer error distribution, and properties of the human visual system. This perceptual matrix is used to adjust the quantizer used in encoding each sub-band signal. Again, the teaching is directed towards a fixed viewing condition.
U.S. Pat. No. 4,939,645, to J. Hopkinson, for "Method and apparatus to reduce transform compression visual artifacts in medical images", describes a method for coding and decoding digital images by partitioning the images into blocks, and coding each image separately according to visually significant responses of the human eye. Coding is achieved by calculating and subtracting a mean intensity value from digital numbers within each block or partition and detecting visually perceivable edge locations within the resultant residual image block. If a visually perceivable edge is contained within the block, gradient magnitude and orientation at opposing sides of the edge within each edge block are calculated and appropriately coded. If no perceivable edge is contained within the block, the block is coded as a uniform intensity block. Decoding requires receiving coded mean intensity value, gradient magnitude and pattern code, and then decoding a combination of these three indicia to be arranged in an orientation substantially similar to the original digital image. The viewing condition is fixed.
U.S. Pat. No. 5,321,776, to J. Shapiro, for "Data compression system including successive approximation quantizer", presents a data processing system with successive refinement quantization and entropy coding to facilitate data compression. The generated compressed bitstream may be truncated at any time and still produce perceptible images. The bitstream is arranged to achieve progressive-by-quality, i.e., to minimize the mean square error at the point of truncation. Human visual characteristics are not considered in the scheme.
Fixed visual weighting may be easily incorporated in an embedded coder through multiplying/dividing the transform coefficients with a model of the contrast sensitivity function (CSF) of the visual system. However, in the case of an embedded coder, the coding bitstream may be truncated at some later time, and the viewing condition at different stages of embedding may be very different. At a low bit rate, the quality of the compressed image is poor and the detailed image features are not available. The image is usually viewed at a relatively far distance and the observer is more interest in the global features. As more and more bits are received, the image quality improves, and the observer may be interested in not only the global features but also the details of the image. The image is examined at a closer distance, it may be also subjected to image analysis, or even be blown up for examination, which equivalently decreases the viewing distance. Thus, different viewing conditions are called for at different stages of the embedding.