Several tutorials and overviews of still image compression methods are available: O. Egger et al., “High performance compression of visual information—A tutorial review—Part I: Still pictures,” Proc. IEEE, Vol. 87, No. 6, pp. 976-1011, June 1999; S. Wong et al., “Radiologic image compression—A review,” Proc. IEEE, Vol. 83, No. 2, pp. 194-219, February 1995; N. D. Memon et al., “Lossless image compression: A comparative study,” Proc. SPIE, Vol. 2418, pp. 8-20, 1995; and T. Q. Nguyen, “A tutorial on filter banks and wavelets,” University of Wisconsin, Madison, Wis. 53706, USA.
A good model of a natural image is based on a power spectrum proportional to f−2, with f being the frequency. This means that most of the energy is concentrated in low-frequency regions. Therefore, a suitable partitioning of the frequency should be finer in low-frequency regions and coarser in high-frequency regions.
For most types of images, direct coding using an entropy coder does not achieve satisfactory compression ratio, so some form of prior decomposition is necessary. Decomposition methods used for still image compression are: predictive, block and subband transformations. Predictive methods are suited for lossless and low compression ratio applications. The main drawbacks of block transformation methods, like discrete cosine transform (DCT), are blocking artifacts at high compression ratios, which are especially visible in image regions with low local variance. Unfortunately, human visual system is very sensitive to such type of image distortion. Subband transformation methods are applicable to both lossless and lossy compression, while the only visible artifact at high compression ratios is the Gibbs phenomenon of linear filters, so-called “ringing effect”, described in O. Egger et al., “Subband coding of images using asymmetrical filter banks,” IEEE Trans. Image Processing, Vol. 4, No. 4, pp. 478-485, April 1995. Due to abundant literature on image compression, the background of this invention is limited to subband transformation.
The subband transformation coefficients are computed by recursively filtering first an input image and then subsequent resulted images with a set of lowpass and highpass filters and down-sampling results. Each subband is separately coded with a bit rate that matches the visual importance of the subband. This leads to visually pleasing image reconstruction and does not produce blocking artifacts. Subband encoding consists of the following four steps: (1) subband decomposition; (2) quantization; (3) probability estimation; and (4) entropy coding of the subbands. The decoding process requires the reversal order of the reversal steps.
The concept of subband transformation was first introduced for speech coding by R. E. Crochiere et al., “Digital coding of speech in subbands,” Bell Syst. Tech. J., Vol. 55, No. 8, pp. 1069-1085, October 1976; and U.S. Pat. No. 4,048,443 issued September 1977 to R. E. Crochiere et al. The non-perfect reconstruction filter with a linear phase is two-band QMF, introduced by J. D. Johnston, “A filter family designed for use in quadrature mirror filter banks,” Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Denver, Colo., pp. 291-294, April 9-11, 1980.
The perfect reconstruction filter banks for one-dimensional (1-D) subband transformations were investigated by several authors, like: M. J. Smith et al., “A procedure for designing exact reconstruction filter banks for tree structured subband coders,” Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), San Diego, Calif., pp. 27.1.1-27.1.4, March 1984; T. A. Ramstad, “Analysis/synthesis filter banks with critical sampling,” Proc. Int. Conf. Digital Signal Processing, Florence, Italy, September 1984; M. Vetterli, “Filter banks allowing perfect reconstruction,” Signal Processing, Vol. 10, No. 3, pp. 219-244, April 1986; M. J. Smith et al., “Exact reconstruction techniques for tree structured subband coders,” IEEE Trans. Acoustics, Speech, Signal Processing, Vol. 34, No. 3, pp. 434-441, June 1986; P. P. Vaidyanathan, “Theory and design of M-channel maximally decimated quadrature mirror filters with arbitrary M, having perfect reconstruction property,” IEEE Trans. Acoustics, Speech, Signal Processing, Vol. 35, No. 4, pp. 476-496, April 1987; P. P. Vaidyanathan, “Quadrature mirror filter bank, M-band extensions and perfect reconstruction technique,” IEEE Acoustics, Speech, Signal Processing Mag., Vol. 4, No. 7, pp. 1035-1037, July 1987; and M. Vetterli et al., “Perfect reconstruction FIR filter banks: Some properties and factorization,” IEEE Trans. Acoustics, Speech, Signal Processing, Vol. 37, No. 7, pp. 1057-1071, July 1989. A design technique leading to numerically perfect reconstruction filter banks has been developed by Nayebi et al., “Time domain filter bank analysis: A new design theory,” IEEE Trans. Signal Processing, Vol. 40, No. 6, pp. 1412-1429, June 1992. However, such filters are relatively long and thus unsuitable for image-coding applications.
1-D subband transformation theory was extended to two-dimensional (2-D) case by P. J. Burt et al., “The Laplacian pyramid as a compact image code,” IEEE Trans. Commun., Vol. 31, No. 4, pp. 532-540, April 1983; M. Vetterli, “Multi-dimensional subband coding: Some theory and algorithms,” Signal Processing, Vol. 6, No. 2, pp. 97-112, April 1984; J. Woods et al., “Subband coding of images,” IEEE Trans. Acoustics, Speech, Signal Processing, Vol. 34, No. 5, pp. 1278-1288, October 1986; U.S. Pat. No. 4,817,182 issued March 1989 to E. H. Adelson et al., which utilizes 2-D separable QMF banks; A. Zandi et al., “CREW lossless/lossy medical image compression,” Ricoh California Research Center, Menlo Park, Calif. 94025, USA, Sep. 12, 1995; and U.S. Pat. No. 6,195,465 issued February 2001 to A. Zandi et al.
State-of-the-art compression algorithms can be divided into three basic groups: single-pass, two-pass and multi-pass. Single-pass algorithms encode/decode image using single access to each transformation coefficient in the memory, as disclosed in C. Chrysafis et al., “Efficient context-based entropy coding for lossy wavelet image compression,” Data Compression Conf., Snowbird, Utah, Mar. 25-27, 1997. These algorithms are usually limited to prior statistical model with fixed parameters, which typically leads to lower compression ratio than achieved by other methods.
Two-pass algorithms encode/decode image using two accesses to each transformation coefficient in the memory. Therefore, they can use prior statistical model with variable parameters, which leads to better compression ratio than in the single-pass case. However, they need to store all transformation coefficients in the memory, in order to perform second pass, which requires additional memory size of the order of an uncompressed input image.
Multi-pass algorithms encode/decode image based on an implicitly defined static model (JPEG2000, SPIHT and EZW). JPEG2000 was described in C. Christopoulos et al. “The JPEG2000 still image coding system: An overview,” IEEE Trans. Consum. Electr., Vol. 46, No. 4, pp. 1103-1127, November 2000. Set partitioning in hierarchical trees (SPIHT) algorithm was disclosed in A. Said et al., “Image compression using the spatial-orientation tree,” Proc. IEEE Int. Symp. Circuits Systems, Chicago, Ill., pp. 279-282, May 1993; A. Said et al., “A new fast and efficient image codec based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst. Video Tech., Vol. 6, No. 3, pp. 243-250, June 1996; and U.S. Pat. No. 5,764,807 issued June 1998 to W. A. Pearlman et al. Alphabet and group partitioning of transformation coefficients was disclosed in U.S. Pat. No. 5,959,560 issued September 1999 to A. Said et al. Embedded Zerotrees Wavelet (EZW) algorithm was described in J. M. Shapiro, “Embedded image coding using zerotrees of wavelets coefficients,” IEEE Trans. Signal Processing, Vol. 41, No. 12, pp. 3445-3462, December 1993. EZW technique is based on: (1) partial ordering of the transformation coefficients by magnitude using a set of octavely decreasing thresholds; (2) transmission of order by a subset partitioning algorithm that is duplicated at the decoder; (3) ordered bit plane transmission of refinement bits; and (4) utilization of the self-similarity of the transformation coefficients across different subbands. The Embedded Predictive Wavelet Image Coder (EPWIC), based on the conditional probabiliti model in addition to EZW, was disclosed in R. W. Buccigrossi et al., “Progressive wavelet image coding based on a conditional probability model,” Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), Munich, Germany, Vol. 4, pp. 2597-2600, Apr. 21-24, 1997; and E. P. Simoncelli et al., “Progressive wavelet image compression using linear inter-band magnitude prediction,” Proc. 4thh Int. Conf. Image Processing, Santa Barbara, Calif., Oct. 26-29, 1997. All these methods store the complete image in the memory and require relatively large number of passes to encode/decode image.
A number of authors have observed that subband transformation coefficients have highly non-Gaussian statistics, according to B. A. Olshausen et al., “Natural image statistics and efficient coding,” Network: Computation in Neural Systems, Vol. 7, No. 2, pp. 333-339, July 1996; R. W. Buccigrossi et al., “Image compression via joint statistical characterization in the wavelet domain,” GRASP Laboratory Technical Report #414, University of Pennsylvania, USA, 30 May 1997; E. P. Simoncelli et al., “Embedded wavelet image compression based on a joint probability model,” Proc. 4th Int. Conf. Image Processing, Santa Barbara, Calif., USA, Oct. 26-29, 1997; and R. W. Buccigrossi et al., “Image compression via joint statistical characterization in the wavelet domain,” IEEE Trans. Image Processing, Vol. 8, No. 12, pp. 1688-1701, December 1999.
The reason is a spatial structure of typical images consisting of smooth regions interspersed with sharp edges. The smooth regions produce near-zero transformation coefficients, while the sharp edges produce large-amplitude transformation coefficients. Statistics of transformation coefficients can be modeled by two-parameter “generalized Laplacian” density function, which sharply peaks at zero, with more extensive tails compared to a Gaussian density function, as in S. G. Mallat, “A theory for multiresolution signal decomposition: The wavelet representation,” IEEE Trans. Pattern Analysis Machine Intelligence, Vol. 11, No. 7, pp. 674-693, July 1989; and E. P. Simoncelli et al., “Noise removal via bayesian wavelet coring,” Proc. 3rd Int. Conf. Image Processing, Lausanne, Switzerland, Vol. 1, pp. 379-383, September 1996. Unfortunately, two-pass algorithm is necessary for the calculation of density function parameters. Furthermore, experimental results show significant disagreement between this density function and actual histograms at higher levels of subband transformation. The lowpass subband contains almost entirely positive transformation coefficients, appropriate to uniform density function.
Higher compression ratio can be achieved by defining symbols based on the context model, i.e. on the basis of neighborhood transformation coefficients, analogously to text compression methods. An analysis of zero-tree and other wavelet coefficient context models was described in S. Todd et al., “Parameter reduction and context selection for compression of gray-scale images,” IBM J. Res. Develop., Vol. 29, No. 2, pp. 188-193, March 1985; V. R. Algazi et al., “Analysis based coding of image transform and subband coefficients,” SPIE Applications of Digital Image Processing XVIII, Vol. 2564, pp. 11-21, July 1995; S. D. Stearns, “Arithmetic coding in lossless waveform compression,” IEEE Trans. Signal Processing, Vol. 43, No. 8, pp. 1874-1879, August 1995; and U.S. Pat. No. 6,222,941 issued April 2001 to A. Zandi et al.
It is possible to find a bit code, which is more efficient than the fixed-length code, if the probability of an occurrence of a particular symbol is known. Codeword assignment is usually done by variable-length coding, run-length coding, Huffman coding and arithmetic coding. Techniques for removing alphabetical redundancy mostly generate prefix codes, and mostly transform the messages into a bit string, assigning longer codes to less probable symbols, as in B. M. Oliver et al., “Efficient coding,” Bell Syst. Tech. J., Vol. 31, No. 4, pp. 724-750, July 1952; D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proc. IRE, Vol. 40, No. 9, pp. 1098- 1101, September 1952; and E. N. Gilbert et al., “Variable length binary encodings,” Bell Syst. Tech. J., Vol. 38, No. 4, pp. 933-967, July 1959.
The highest compression ratio is achieved by arithmetic coding, which theoretically can remove all redundant information from a digitized message, according to L. H. Witten et al., “Arithmetic coding for data compression,” Commun. ACM, Vol. 30, No. 6, pp. 520-540, June 1987; A. Moffat et al., “Arithmetic coding revisited,” Proc. Data Compression Conf., Snowbird, Utah, pp. 202-211, March 1995; and A. Moffat et al., “Arithmetic coding revisited,” ACM Trans. Inform. Syst., Vol. 16, No. 3, pp. 256-294, July 1998.
Arithmetic Q-coder is disclosed in Mitchell et al., “Software implementations of the Q-coder,” IBM J. Res. Develop., Vol. 21, No. 6, pp. 753-774, November 1988; W. B. Pennebaker et al., “An overview of the basic principles of the Q-coder adaptive binary arithmetic coder,” IBM J. Res. Develop., Vol. 32, No. 6, pp. 717-726, November 1988; and U.S. Pat. Nos.: 4,933,883 issued June 1990; and 4,935,882 issued June 1990 to W. B. Pennebaker et al.
Arithmetic Z-coder is disclosed in L. Bottou et al., “The Z-coder adaptive coder,” Proc. Data Compression Conf., Snowbird, Utah, pp. 13-22, March 1998; and U.S. Pat. Nos.: 6,188,334 issued February 2001; 6,225,925 issued May 2001; and 6,281,817 issued August 2001 to Y. Bengio et al.
However, this invention is based on the range coder disclosed in G. N. N. Martin, “Range encoding: an algorithm for removing redundancy from a digitised message,” Proc. Video & Data Recording Conf., Southampton, UK, Jul. 24-27, 1979.
Both processing time and memory size of state-of-the-art lossy image compression methods increase with the compression ratio. State-of-the-art microprocessors, signal processors and even microcontrollers have small capacity of fast memory (general-purpose processor registers and internal or external cache memory), and large capacity of several times slower memory (external system memory). This invention fits most or even all necessary temporary data into this fast memory, thus additionally achieving fastest algorithm execution.
The common approach for decreasing the required memory size is to divide the large image into blocks and encode each block independently. All best state-of-the-art still image compression methods (JPEG2000, JPEG, etc.) and moving image compression methods (MPEG-4, MPEG-2, MPEG-1, etc.) are block based, as described in D. Santa-Cruz et al., “JPEG2000 still image coding versus other standards,” Proc. SPIE 45th annual meeting, Applications of Digital Image Processing XXIII, San Diego, Calif., Vol. 4115, pp. 446-454, Jul. 30-Aug. 4, 2000.
JPEG2000 encoder first divides an input uncompressed image into non-overlapping blocks, then recursively subband transforms each block independently using direct discrete wavelet transform (DWT), according to M. Boliek et al. (editors), “JPEG2000 Part I Final Draft International Standard,” (ISO/IEC FDIS15444-1), ISO/IEC JTC1/SC29/WG1 N1855, Aug. 18, 2000. The transformation coefficients are then quantized and entropy coded, before forming the output codestream. The input codestream in decoder is first entropy decoded, dequantized and recursively subband transformed into independent blocks using inverse DWT, in order to produce the reconstructed image. However, tiling produces blocking artifacts at the boundaries between blocks. This drawback can be partially eliminated by framing, i.e. overlapping of the neighborhood blocks for at least one pixel. Another serious drawback is quality degradation at higher compression ratios and limited maximum acceptable compression ratio.
JPEG2000 standard supports two filtering modes: a convolution and a lifting. The signal should be first extended periodically on both ends for half-length of the filter. Convolution-based filtering consists of performing a series of multiplications between the low-pass and high-pass filter coefficients and samples of the extended 1-D signal. Lifting-based filtering consists of a sequence of alternative updating of odd sample values of the signal with a weighted sum of even sample values, and updating of even sample values with a weighted sum of odd sample values, according to W. Sweldens, “The lifting scheme: A custom-design construction of biorthogonal wavelets,” Appl. Comput. Harmonic Analysis, Vol. 3, No. 2, pp. 186-200, 1996; and W. Sweldens, “The lifting scheme: Construction of second generation wavelets,” SIAM J. Math. Anal., Vol. 29, No. 2, pp. 511-546, 1997.
JPEG2000 utilizes MQ arithmetic coder similar to the QM coder adopted in the original JPEG standard described by G. K. Wallace, “The JPEG still picture compression standard,” IEEE Trans. Consum. Electron., Vol. 38, No. 1, pp. 18-34, February 1992; U.S. Pat. Nos. 5,059,976 issued October 1991; and 5,307,062 issued April 1994 to F. Ono et al.
JPEG standard was described in “Digital compression and coding of continuous-tone still images,” Int. Org. Standardization ISO/IEC, JTC1 Commitee Draft, JPEG 8-R8, 1990; and G. K. Wallace, “The JPEG still picture compression standard,” Commun. ACM, Vol. 34, No. 4, pp. 30-44, April 1991. The original image is divided into 8×8 blocks, which are separately transformed using DCT. After transformation, the 64 transform coefficients are quantized by different quantization steps in order to account the different importance of each transformation coefficient, using smaller quantization steps for low-frequency coefficients than those for high-frequency coefficients. The transformation coefficients are then coded using either Huffman or arithmetic coding. The independent quantization of blocks causes blocking effect. JPEG lossless compression does not use transform, but a prediction for the removal of a redundant information between neighboring pixels. The prediction error is coded by a Huffman code. The compression ratio is about 2:1 for natural images.
MPEG-4 video compression standard is object based and is developed to compress sequences of images using interframe coding. However, its intraframe coding is a still image compression method, very similar to JPEG. The bounding box of the object to be coded is divided into macroblocks of 16×16 size, containing four blocks of 8×8 pixels for the luminance and two blocks of 8×8 pixels for the down-sampled chrominance. DCT is performed separately on each of the blocks in a macroblock, coefficients are quantized, zigzag scanned, and entropy coded by run-length and Huffman methods.