This invention relates generally to the field of image compression and in particular to an improved wavelet encoding and decoding of digital pictures.
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright(copyright) 1998, Microsoft Corporation, All Rights Reserved.
Digital pictures are used in many applications, such as Web pages, CD-ROM encyclopedias, digital cameras, and others. In most cases is necessary to compress the pictures, in order for them to fit into a small amount of storage or to be downloaded in a short amount of time. For example, in a typical digital camera, pictures are taken at a resolution of 1024xc3x97768 picture elements (pixels), with a resolution of 12 to 24 bits per pixel. The raw data in each image is therefore around 1.2 to 2.5 megabytes. In order to fit several pictures in a computer diskette, for example, it is necessary to reduce the amount of data used by each picture. The larger the compression ration that is achieved, the more pictures will fit into a diskette or memory card and the. faster they can be transferred via bandwidth limited transmission medium such as telephone lines.
Image compression has been extensively studied over the past twenty years. The JPEG standard, defined by the JPEG (joint photographic experts group) committee of ISO (International Standards Organization), was defined in 1992 and is the most popular method of compressing digital pictures. In JPEG, small square blocks of pixels (of dimensions 8xc3x978) are mapped into the frequency domain by means of a discrete cosine transform (DCT). The DCT coefficients are quantized (divided by a scale factor and rounded to the nearest integer) and mapped to a one-dimensional vector via a fixed zigzag scan pattern. That vector is encoded via a combination of run-length and Huffinan encoding.
The independent processing of small 8xc3x978 blocks in JPEG is an advantage from an implementation viewpoint, especially in low-cost hardware. However, it also leads to the main problem with JPEG: blocking artifacts. Because the quantization errors from adjacent blocks are uncorrelated among blocks but correlated within the blocks, the boundaries of the 8xc3x978 blocks becomes visible in the reconstructed image due to the potential difference in encoding between adjacent blocks. Such artifacts are referred to as tiling or blocking artifacts which can be reduced (but not completely eliminated) by using transforms with overlapping basis functions.
An efficient way to remove the blocking artifacts is to replace the block DCT by a wavelet decomposition, which provides an efficient time-frequency representation. Very good compression performance can be obtained by quantizing and encoding wavelet coefficients.
Many wavelet-based image compression systems have been reported in the technical literature in the past few years. With wavelets it is possible to achieve compression ratios that typically range from 20% to 50% better than JPEG. More importantly, wavelet transforms lead to pictures that do not have the disturbing blocking artifacts of JPEG. Therefore, wavelet-based transforms are becoming increasingly popular. In fact, in the next revision of JPEG, named JPEG2000, all proposals under consideration use wavelets.
Some prior wavelet transforms decompose images into coefficients corresponding to 16 subbands. This results in a four by four matrix of subbands, referred to as a big block format, representing spectral decomposition and ordering of channels. The letters L and H are used to identifying low pass filtering and high pass filtering respectively for each subband. The first subband comprises LL and HL coefficients, where the first letter in each set correspond to horizontal filtering and the second corresponds to vertical filtering. Two stages are used in each subband filtering combination. The ordering corresponds to frequencies increasing from left to right and from bottom to top. This ordering is fixed to allow both encoding and decoding to function in a fixed manner. Quantization of the coefficients is then performed, followed by some form of compressive encoding of the coefficients, including adaptive Huffinan encoding or arithmetic encoding to further compress the image. These forms of encoding can be quite complex, including zero tree structures, which depend on the data types. These encoders are fairly complex, and many need to be modified for different images to be compressed, making them difficult to implement in hardware.
There is a need for a simple encoding technique that works on wavelet compression coefficients and similar finite alphabet data which may be implemented in either hardware or software.
Adaptive encoding is performed on signed integer data for which values of smaller magnitude are more likely to occur than those with higher magnitudes. The encoding is performed in bit planes, which allows for scalability in precision of reconstruction, from lossless (no errors) to various levels of approximated reconstruction. Unlike Huffinan encoding, code tables are not necessary because simple rules determine the code words from the input strings.
In one form, for each bit plane, the shortest code word (a single 0) is assigned to represent a run of the most likely input, zero, having length 2k, where k is a parameter that controls the codewords used to represent strings of quantized coefficients, seeking to minimize the number of bits spent in the codewords. K is adapted to increase when longer runs are encountered, and to decrease otherwise, e.g., when symbols different from those in the run are encountered. The encoding of the bit planes can be done with any efficient entropy encoder, such as an adaptive arithmetic encoder, but in one implementation, a new adaptive Run-length plus Golomb-Rice encoder is used.
By not requiring the use of data-dependent data structures such as zerotrees, or a separate list for set partitions in trees, hardware implementations are easier to build and software implementations may run faster.