The present invention relates to the lossy compression of still images and, more particularly, to an improved method for assigning bits to different spatial and frequency portions of the compressed image so as to maximise perceived visual quality.
Conventional image compression systems, such as that represented by the well-known baseline JPEG standard, suffer from a number of problems of which the following three are notable.
1) They are unable to exploit visual masking and other properties of the Human Visual System (HVS) which vary spatially with image content. This is because the quantization parameters used by these algorithms are constant over the extent of the image. As a result, images are unable to be compressed as efficiently as might be expected if visual masking were taken into account.
2) To achieve a target bit-rate or visual quality, the image must be compressed multiple times, while varying one or more of the quantization parameters in an iterative fashion. This is known as the rate-control problem and it enters into many practical image compression applications, including the compression of digital camera images and page compression to save memory within printers, scanners and other such peripheral devices.
3) The target bit-rate and desired viewing resolution must be known prior to compression. By contrast, for many applications, a scalable bit-stream is highly desirable. A scalable bit-stream is one which may be partially transmitted or decompressed so as to reconstruct the image with lower quality or at a lower resolution, such that the quality of the reconstructed image is comparable to that which would have been achieved if the relevant bit-rate and resolution were known when the image was compressed. Obviously, this is a desirable property for compressed image databases, which must allow remote clients access to the image at the resolution and bit-rate (i.e. download time) of their choice. Scalability is also a key requirement for robust transmission of images over noisy channels. The simplest and most commonly understood example of a scalable bit-stream is a so-called xe2x80x9cprogressivexe2x80x9d bit-stream. A progressive bit-stream has the property that it can be truncated to any length and the quality of the reconstructed image should be comparable to that which could have been achieved if the image had been compressed to the truncated bit-rate from the outset. Scalable image compression clearly represents one way of achieving non-iterative rate-control and so addresses the concerns of item 2) above.
A number of solutions have been proposed to each of these problems. The APIC image compression system (Hxc3x6ntsch and Karam, xe2x80x9cAPIC: Adaptive Perceptual Image Coding Based on Sub-band Decomposition with Locally Adaptive Perceptual Weighting,xe2x80x9d International Conference on Image Processing, vol. 1, pp. 37-40, 1997) exploits visual masking in the Wavelet transform domain, through the use of an adaptive quantizer, which is driven by the causal neighbourhood of the sample being quantized, consisting of samples from the same sub-band. The approach has a number of drawbacks: it is inherently not scalable; iterative rate-control is required; and the masking effect must be estimated from a causal neighbourhood of the sample being quantized, in place of a symmetric neighbourhood which would model the HVS more accurately. On the other hand, a variety of solutions have been proposed to the second and third problems. Some of the more relevant examples are the SPIHT (A. Said and W. Pearlman, xe2x80x9cA New, Fast and Efficient Image Codec based on Set Partitioning in Hierarchical Trees,xe2x80x9d IEEE Trans. on Circuits and Systems for Video Technology, vol. 6, no. 3, pp. 243-250, June 1996) and EBCOT (D. Taubman, xe2x80x9cEBCOT: Embedded Block Coding with Optimised Truncation,xe2x80x9d ISO/IEC JTC 1/SC 29/WG1 N1020R, Oct. 21, 1998.) image compression methods. These both produce highly scalable bit-streams and directly address the rate-control problem; however, they focus on minimising Mean Squared Error (MSE) between the original and reconstructed images, rather than minimising visual distortion. Some attempts have been made to exploit properties of the HVS within the context of SPIHT and other scalable compression frameworks; however, these approaches focus on spatially uniform properties such as the Contrast Sensitivity Function (CSF), and are unable to adapt spatially to exploit the important phenomenon of visual masking. The compression system proposed by Mazzarri and Leonardi (A. Mazzarri and R. Leonardi, xe2x80x9cPerceptual Embedded Image Coding using Wavelet Transforms,xe2x80x9d International Conference on Image Processing, vol. 1, pp. 586-589, 1995.) is an example of this approach. Also worth mentioning here is the method proposed by Watson (A B Watson, xe2x80x9cDCT Quantization Matrices Visually Optimized for Individual Images,xe2x80x9d Proceedings of the SPIE, vol. 1913, pp. 202-216, 1993.) for optimising quantization tables in the baseline JPEG image compression system. Although this method is restricted to space invariant quantization parameters and non-scalable compression, by virtue of its reliance on the baseline JPEG compression standard, it does take visual masking and other properties of the HVS into account in designing a global set of quantization parameters. The visual models used in the current invention are closely related to those used by Watson and those used in APIC.
Embedded Block Coding
Embedded block coding is a method of partitioning samples from the frequency bands of a space frequency representation of the image into a series of smaller blocks and coding the blocks such that the bit stream in each block can be truncated at a length selected to provide a particular distortion level. To achieve embedded block coding, the image is first decomposed into a set of distinct frequency bands using a Wavelet transform, Wavelet packet transform, Discrete Cosine Transform, or any number of other space-frequency transforms which will be familiar to those skilled in the art. The basic idea is to further partition the samples in each band into smaller blocks, which we will denote by the symbols, B1,B2,B3, . . . . The particular band to which each of these blocks belongs is immaterial to the current discussion. The samples in each block are then coded independently, generating a progressive bit-stream for each block. Bi, which can be truncated to any of a set of distinct lengths, Ri1,Ri2, . . . ,RiNi, prior to decoding. Efficient block coding engines, which are able to produce a finely gradated set of truncation points, Rin, such that each truncated bit-stream represents an efficient coding of the small independent block of samples, Bi, have been introduced only recently as part of the EBCOT image compression system. A discussion of the techniques involved in generating such embedded block bit-streams is inappropriate and unnecessary here, since the present invention does not rely upon the specific mechanism used to code each block of samples, but only upon the existence of an efficient, fine embedding, for independently coded blocks of samples from each frequency band.
The motivation for considering embedded block coding is that each block may be independently truncated to any desired length in order to optimise the trade-off between the size of the overall compressed bit-stream representing the image and the distortion associated with the image which can be reconstructed from this bit-stream. In the simplest incarnation of the idea, each block bit-stream is truncated to one of the available lengths, Rin, in whatever manner is deemed most appropriate, after which the truncated bit-streams are concatenated in some pre-determined order, including sufficient auxiliary information to identify the truncation point, ni, and length, Rini, associated with each block. Evidently, this provides an elegant solution to the rate-control problem described above. In more sophisticated incarnations of the idea, the overall compressed bit-stream might be organised into successively higher quality xe2x80x9clayersxe2x80x9d, where each layer contains incremental contributions from the embedded bit-stream of each block, such that layers 1 through l together contain the initial Rinil bytes from code-block Bi, for each l=1,2,3, . . . . The truncation points, nil, associated with each block and each layer may be independently selected, subject only to the requirement that nilxe2x89xa7nilxe2x88x921, which is not restrictive in practice. The EBCOT image compression system provides a mechanism for efficiently embedding a large number of layers in a single compressed image bit-stream, thereby generating a highly scalable representation of the image. In addition to scalability, bit-streams generated in this way possess important properties, including the ability to decompress arbitrary portions of only those code-blocks which are required to reconstruct a limited spatial region within the image. This is identified as a xe2x80x9crandom accessxe2x80x9d property.
Embedded block coding also has important consequences for keeping implementation memory requirements down. This is because the space-frequency transform itself generally has localised memory requirements and the block coding process is also highly localised. Even though all blocks in the image (or at least a large fraction of them) must generally be considered to optimally select the truncation points, nil, for each block, Bi, in each layer, l, these decisions may be made after the code-blocks have been compressed so that the impact on implementation memory is limited to the compressed representation of each block via its embedded bit-stream, together with some summary information which might be used to assist in determining good truncation points. Together, this information is generally significantly smaller than the original image.
Rate-Distortion Optimisation
In considering rate distortion optimisation we must consider methods for minimising overall image distortion, subject to a constraint on the overall bit-rate, and for minimising bit-rate subject to a constraint on the overall image distortion. The optimisation task is greatly simplified by considering only xe2x80x9cadditivexe2x80x9d distortion measures, where the overall distortion, D, may be written as a sum of independent contributions, Di, from each of the code-blocks, Bi. Under these conditions, let Din, denote the contribution to the overall image distortion from code-block Bi, when its embedded representation is truncated to length Rin. The objective, then, is to find the set of truncation points, ni, which minimise   D  =            ∑      i        ⁢          xe2x80x83        ⁢          D      i              n        i            
subject to Rxe2x89xa6Rmax, where Rmax is the bit-rate constraint and   R  =            ∑      i        ⁢          xe2x80x83        ⁢          R      i              n        i            
is the overall bit-rate.
It is common to use Mean Squared Error (MLSE) as the distortion measure, primarily because MSE satisfies the additivity property in equation 1 reasonably well. Specifically, let wi,k denote the basis function associated with sample si[k] of block Bi in the space-frequency transform, so that the original image may be recovered as       ∑    i    ⁢            ∑      k        ⁢                  w                  i          ,          k                    ·                        s          i                ⁡                  [          k          ]                    
Now define             D      ^        i    n    =                    "LeftDoubleBracketingBar"                  w          i                "RightDoubleBracketingBar"            2        ⁢                  ∑        k            ⁢                        (                                                                      s                  ^                                i                n                            ⁡                              [                k                ]                                      -                                          s                i                            ⁡                              [                k                ]                                              )                2            
where ŝin[k] denotes the distorted samples reconstructed from the bit-stream after truncating block Bi""s embedded representation to length Rin, and ∥wi∥ denotes the L2-norm of the basis functions (All basis functions, wi,k, for block Bi are shifted versions of one another since the block""s samples all belong to the same frequency band of the space-frequency transform. Consequently, they must all have the same L2-norm) associated with each of the samples in block Bi. Then, setting Din={circumflex over (D)}in, it is not hard to show that the additivity requirement of equation 1 is satisfied with D denoting MSE, provided either the basis functions, wi,k, are all orthogonal to one another, or the individual sample distortions, ŝin[k]xe2x88x92si[k], are unco-related. In practical applications, neither of these assumptions might be strictly true, but the basis functions are often approximately orthogonal.
Now it is not hard to see that any set of truncation points, {ni,xcex}, which minimises       (                  D        λ            +              λ        ⁢                  xe2x80x83                ⁢                  R          λ                      )    =            ∑      i        ⁢          xe2x80x83        ⁢          (                        D          i                      n                          i              ,              λ                                      +                  λ          ⁢                      xe2x80x83                    ⁢                      R            i                          n                              i                ,                λ                                                        )      
for some xcex, is optimal in the sense that the distortion cannot be reduced without also increasing the overall bit-rate. Thus, if a value of xcex can be found such that the truncation points which minimise equation 4 yield the target rate, Rxcex=Rmax, exactly, then this set of truncation points must be an optimal solution to the rate-distortion optimisation problem. In general, however, it will not be possible to find a value of xcex for which Rxcex=Rmax, since there are only finitely many code-blocks with a finite number of available truncation points. Nevertheless, if the code-blocks are relatively small and each block offers a finely embedded bit-stream, it is sufficient in practice to find the smallest value of xcex such that Rxcexxe2x89xa6Rmax. Similarly, if one is interested in minimising the overall bit-rate subject to some constraint on the distortion, it is sufficient in practice to find the smallest value of xcex such that Dxcexxe2x89xa6Dmax.
It can be demonstrated that the determination of the truncation points, nixcex, which minimise the expression in equation 4 may be performed very efficiently, based on a small amount of summary information collected during the generation of each code-block""s embedded bit-stream. It is clear that this minimisation problem separates into the independent minimisation of Dini,xcex+xcexRini,xcex, for each block, Bi. An obvious algorithm for finding each truncation point, nixcex, is as follows:
Initialize ni,xcex=0 (i.e. no information included for the block)
For j=1,2,3, . . .
Set xcex94Rij=Rijxe2x88x92Rini,xcex and xcex94Dij=Dini,xcexxe2x88x92Rij 
If xcex94Dij/xcex94Rij greater than xcex then set ni=j
Since this algorithm might need to be executed for many different values of xcex, it makes sense to first identify the subset, Ni, of candidate truncation points. Let j1 less than j2 less than  . . . be an enumeration of the elements of Ni and let the rate-distortion xe2x80x9cslopesxe2x80x9d for each element be given by Sijk=xcex94Dijk/xcex94Rijk, where xcex94Rijk=Rijkxe2x88x92Rijkxe2x88x921 and xcex94Dijk=Dijkxe2x88x921xe2x88x92Dijk. Evidently, the slopes must be strictly decreasing, for if Sijk+1xe2x89xa7Sijk then the truncation point, jk, could never be selected by the above algorithm, regardless of the value of xcex, and so Ni would not be the set of candidate truncation points. When restricted to the set, Ni, of truncation points whose slopes are strictly decreasing, the algorithm reduces to the trivial selection, ni,xcex=max{jkxcex5Ni|Sijk greater than xcex} so it is clear that strictly decreasing slope is a sufficient as well as a necessary condition for the set of candidate truncation points.
In a typical implementation of these rate-distortion optimisation ideas within the context of embedded block coding, the set Ni is determined using a conventional convex hull analysis, immediately after the bit-stream for Bi has been generated. The truncation lengths, Rijk, and slopes, Sijk, are stored in a compact form along with the embedded bit-stream until all code-blocks have been compressed, at which point the search for xcex and {ni,xcex}, which minimise distortion subject to a maximum bit-rate (or minimise bit-rate subject to a maximum distortion) proceeds. The search may be repeated for each bit-stream layer in the case of the more complex bit-stream organisations described previously.
According to a first aspect the present invention consists in a method of compressing a digital image including the steps of:
a) Decomposing the image into a set of distinct frequency bands using a space frequency transform;
b) Partitioning the samples in each frequency band into code blocks;
c) For each code-block, generating an embedded bit-stream to represent the contents of the respective code block;
d) Determining a rate-distortion optimal set of truncation points, nil for each code-block, Bi, and each quality layer, l, of which there may be only one subject to a constraint on the overall bit-rate or distortion for the layer in a manner which is sensitive to the masking property of the Human Visual System (HVS); and
e) Storing the embedded bit-streams for each code-block.
Preferably, in the compression method, the code-block truncation points are selected according to a rate-distortion optimisation criterion, using a distortion measure which is sensitive to masking in the HVS.
Preferably also, contributions to the distortion measure from each sample in a code block are weighted as a function of a neighbourhood of samples surrounding the respective sample. In the preferred embodiment the distortion measure is a weighted sum of the squared errors taken at each sample, and the weighting function is a function of the magnitudes of the samples in the respective neighbourhood of samples. To ease the computational burden, the weighting function may be held constant over a sub-block of samples, which is preferably selected to have dimensions no larger than the full size of the respective code block. The samples that are averaged by the weighting function are preferably taken only from within the sub-block, however in some embodiments the samples that are averaged may also include samples taken from outside the sub-block.
In the preferred embodiment, the method is performed by a coding engine using an algorithm which passes through the block multiple times for every bit-plane in the magnitude of the samples, starting with the most significant bit and working down to the least significant bit, the truncation points being identified with the completion of each coding pass. In the preferred embodiment, for each code-block, Bi, the size of the bit-stream, Rin, at each truncation point, n, and the change in visual distortion, xcex94Din, between truncation points nxe2x88x921 and n are determined and this information is supplied to a convex hull analysis system, which determines the set of truncation points, Ni={n1,n2, . . . }, which are candidates for the rate-distortion optimisation algorithm, as well as respective monotonically decreasing rate-distortion slopes Sinj. Preferably also, summary information, Ni, Rin and Sin, is stored along with the embedded bit streams for each code block, the storing process taking place until sufficient information has been stored to enable truncation points to be determined for each code-block.
In the preferred embodiment, this information is saved until all code-blocks in the image have been compressed; however, memory constrained applications might choose to begin making truncation decisions before this point, subject to the available working memory.
In the preferred embodiment, the rate-distortion optimal set of truncation points nil are determined for each code with a plurality of layers, each layer targeted to a distinct bit-rate or distortion level, with each layer targeting successively higher image quality such that for each successive layer l there are nilxe2x89xa7nilxe2x88x921 truncation points, and the final scalable image bit-stream is formed by including Rinilxe2x88x92Rinilxe2x88x921 samples from code-block Bi, into layer l, along with respective auxiliary information to identify the number of samples which have been included for each block and the relevant truncation points. Preferably also, the coding engine uses an algorithm which passes through the code block multiple times for every bit-plane in the magnitude of the samples, starting with the most significant bit and working down to the least significant bit; the truncation points being identified with the completion of each coding pass. For each code-block, Bi, the size of the bit-stream, Rin, at each truncation point, n, and the change in visual distortion, xcex94Din, between truncation points nxe2x88x921 and n are determined and this information is supplied to the convex hull analysis system to determine the set of truncation points, Ni={n1,n2, . . . }, which are candidates for the rate-distortion optimisation algorithm, as well as the monotonically decreasing rate-distortion slopes, Sinj. The coding engine preferably uses the EBCOT algorithm as herein before defined.
In the preferred embodiment, all of the code blocks have roughly the same rectangular size, regardless of the frequency band to which they belong, and this size is approximately in the range 32xc3x9732 to 64xc3x9764, where the smaller end of this size range is generally preferred. Also, in the preferred embodiment, the block partitioning operation is implemented incrementally, generating new code blocks and sending them to the block coding system as the relevant frequency band samples become available.
In certain embodiments of the invention the method may be applied to colour image compression in an opponent colour representation, in which case distortion from the chrominance channels is scaled differently to the distortion from the luminance channels prior to the application of the rate-distortion optimisation procedure. In such embodiments, the distortion measure is preferably modified to account for masking of chrominance artefacts by activity in the luminance channel. Preferably also, the distortion measure is modified to account for cross-channel masking between chrominance channels in these embodiments.
According to a second aspect the present invention consists in a method of decompressing a digital image from a compressed bit stream created by the method set out above, the decompression method including the steps of:
a) Unpacking the layered compressed bit-stream to recover the truncated embedded bit-streams corresponding to each code-block.
b) Decoding and assembling the code-blocks into a set of frequency bands.
c) Synthesising a reconstructed image from the frequency bands through the inverse transform.
Preferably in the decoding method, the blocks are decoded on demand, as the relevant frequency band samples are requested by the inverse transform. Preferably also, the synthesis operation proceeds incrementally, requesting frequency samples and using them to synthesise new image samples, as those image samples are requested by the application.
In various embodiments of the invention, the transform may be a Wavelet transform a Wavelet packet transform, a Discrete Cosine Transform, or any number of other space-frequency transforms which will be familiar to those skilled in the art. In the preferred embodiment of the invention, a Wavelet transform is used, having the well-known Mallat decomposition structure. Also, in the preferred embodiment, the transform is implemented incrementally, producing new frequency band samples whenever new image samples become available, so as to minimise the amount of image or frequency band samples which must be buffered in working memory. Either or both systems may be physically implemented either in hardware or as a general purpose processor executing software instructions.