The present invention relates to methods and apparatus for performing compression on image data, by wavelet transformation of the image data followed by quantization and encoding. In preferred embodiments, the invention is a method and apparatus for performing compression on image data (e.g., image data generated by a document scanner) in a manner allowing fast decompression, by wavelet transformation of the image data (in a manner imposing low memory requirements) followed by quantization and encoding.
It is well known to perform image compression on digital image data to generate a reduced set of compressed data from which each original image (determined by the uncompressed data) can be reconstructed without loss of essential features. An inverse transformation (decompression) can be applied to compressed image data (e.g., following transmission or storage of the compressed image data) to recover data indicative of each image determined by the original data (or a reasonable facsimile of each such image).
In color imaging devices, each pixel of an image is determined by three color component values (e.g., red, green, and blue values). The three sets of color component values that determine an image are typically processed separately. Digital data that determines a pixel of a color image comprises three color component words, each of which is a multi-bit digital word determining a color component sample (e.g., a red, green, or blue sample of an analog image representation).
Throughout the specification, including in the claims, xe2x80x9cblockxe2x80x9d denotes an array of Nxc3x97M samples (N columns and M rows of samples, where N and M are integers) of a given color component, and xe2x80x9cwordxe2x80x9d denotes a multi-bit digital word that determines a color component sample (e.g., a blue sample of an analog image representation) or a coefficient generated by performing a transform on a set of color component samples (e.g., one of the coefficients generated by performing a discrete cosine transform or wavelet transform on a row or column of blue samples of an analog image representation).
It should be appreciated that throughout this disclosure, the orthogonal dimensions of a block of data are arbitrarily denoted as xe2x80x9crowsxe2x80x9d and xe2x80x9ccolumns.xe2x80x9d Thus, the rows and columns of a block can equally well be denoted as xe2x80x9ccolumnsxe2x80x9d and xe2x80x9crows,xe2x80x9d respectively, Similarly, a method in which a xe2x80x9chorizontalxe2x80x9d filtering operation is performed on rows of a block (to generate filtered data) and a xe2x80x9cverticalxe2x80x9d filtering operation is then performed on columns of the filtered data can equivalently be described as a method in which a xe2x80x9cverticalxe2x80x9d filtering operation is performed on columns of the same block (if the rows are relabeled as columns) to generate the same filtered data and a xe2x80x9chorizontalxe2x80x9d filtering operation is then performed on rows of the filtered data. Thus, a method comprising sequential horizontal and vertical filtering operations (each vertical filtering operation following a horizontal filtering operation) can equivalently be described as a method comprising sequential vertical and horizontal filtering operations (each horizontal filtering operation following a vertical filtering operation).
Typical methods for performing lossy image compression on image data include three steps: an image transform step which generates transform coefficients by performing a transform on the image data (e.g., a discrete cosine transform or wavelet transform); followed by a quantization step which replaces each transform coefficient with a quantized coefficient comprising fewer bits on the average (e.g., a scalar quantization step in which each of the coefficients is divided by the quantization step size); and finally an entropy encoding step in which the quantized coefficients are replaced by code words (e.g., a Huffman encoding or arithmetic encoding operation, in which the quantized coefficients that occur more frequently are replaced by relatively small code words and the quantized coefficients that occur less frequently are replaced by relatively large code words).
Decompression of compressed image data is the inverse of compression, and includes an initial decoding step (in which the entropy encoded code words that comprise the compressed data are decoded); followed by an inverse quantization step (in which inverse quantization is performed on the decoded data); and finally an inverse transform step (performed on the data resulting from the inverse quantization) which reconstructs the original image data.
With reference to FIG. 1, in a conventional three-stage wavelet transform, a xe2x80x9chorizontalxe2x80x9d wavelet transform is initially performed on each row (sometimes referred to as a xe2x80x9clinexe2x80x9d) of a block (typically an Mxc3x97M block) of input image data (block 1 of FIG. 1) to convert each row into two vectors, zL and zH, each comprising M/2 coefficients (coefficient words). All the vectors zL together define coefficient block xe2x80x9cLxe2x80x9d (having M rows and M/2 columns of coefficients) which is indicative of relatively low spatial frequency information, and the vectors zH together define a coefficient block xe2x80x9cHxe2x80x9d (having M rows and M/2 columns of coefficients) which is indicative of relatively high spatial frequency information. This horizontal wavelet transform is equivalent to passing the input block 1 through a xe2x80x9chigh passxe2x80x9d transform filter 2 and a xe2x80x9clow passxe2x80x9d transform filter 4, passing the output of filter 2 through decimation filter 3 (in which it undergoes decimation which reduces its sampling frequency by a factor of two) to generate block xe2x80x9cHxe2x80x9d, and passing the output of filter 4 through decimation filter 5 (in which it undergoes decimation which reduces its sampling frequency by a factor of two) to generate block xe2x80x9cL.xe2x80x9d
Then, another wavelet transform (a xe2x80x9cverticalxe2x80x9d wavelet transform) is performed on each column of block xe2x80x9cLxe2x80x9d (each column comprising M coefficients indicative of relatively low spatial frequency information). Since the xe2x80x9cverticalxe2x80x9d wavelet transform filters columns rather than rows, it requires that block xe2x80x9cLxe2x80x9d has been stored in a memory and read out from the memory on a column by column basis to perform the vertical transform. Each column of block xe2x80x9cLxe2x80x9d (sometimes referred to as a xe2x80x9clinexe2x80x9d) is converted into two vectors, zLL and zLH, each comprising M/2 coefficients. All the vectors zLL together define coefficient block xe2x80x9cLLxe2x80x9d (having M/2 rows and M/2 columns of coefficients, and indicative of the relatively low spatial frequency information of block xe2x80x9cLxe2x80x9d), and the vectors zLH together define coefficient block xe2x80x9cLHxe2x80x9d (having M/2 rows and M/2 columns of coefficients, and indicative of the relatively high spatial frequency information of block xe2x80x9cLxe2x80x9d) . This vertical wavelet transform is equivalent to passing block xe2x80x9cLxe2x80x9d through a xe2x80x9chigh passxe2x80x9d transform filter 6 and a xe2x80x9clow passxe2x80x9d transform filter 8, passing the output of filter 6 through decimation filter 7 (in which it undergoes decimation which reduces its sampling frequency by a factor of two) to generate block xe2x80x9cLHxe2x80x9d, and passing the output of filter 8 through decimation filter 9 (in which it undergoes decimation which reduces its sampling frequency by a factor of two) to generate block xe2x80x9cLL.xe2x80x9d
Then, another wavelet transform (a second xe2x80x9chorizontalxe2x80x9d wavelet transform) is performed on each row of block LL. Since this horizontal wavelet transform filters rows rather than columns, it requires that block LL has been stored in memory and read out from memory on a row by row basis to perform the filtering. Each row of block LL is converted into two vectors, zLLL and zLLH, each comprising M/4 coefficients. All the vectors zLLL together define coefficient block xe2x80x9cLLLxe2x80x9d (having M/4 rows and M/2 columns of coefficients, and indicative of the relatively low spatial frequency information of block LL), and the vectors zLLH together define coefficient block xe2x80x9cLLHxe2x80x9d (having M/4 rows and M/2 columns of coefficients, and indicative of the relatively high spatial frequency information of block LL). This horizontal wavelet transform is equivalent to passing block LL through a xe2x80x9chigh passxe2x80x9d transform filter 10 and a xe2x80x9clow passxe2x80x9d transform filter 12, passing the output of filter 10 through decimation filter 11 (in which it undergoes decimation which reduces its sampling frequency by a factor of two) to generate block xe2x80x9cLLHxe2x80x9d, and passing the output of filter 12 through decimation filter 13 (in which it undergoes decimation which reduces its sampling frequency by a factor of two) to generate block xe2x80x9cLLL.xe2x80x9d
If an additional vertical wavelet transform is performed on block LL, block LLL is transformed into a block xe2x80x9cLLLLxe2x80x9d (having M/4 rows and M/4 columns of coefficients, and indicative of the relatively low spatial frequency information of block LLL) and a block xe2x80x9cLLLHxe2x80x9d (having M/4 rows and M/4 columns of coefficients, and indicative of the relatively high spatial frequency information of block LLL).
Thus, a conventional four stage wavelet transform method (whose first three stages are those described with reference to FIG. 1, and whose final stage is a second vertical wavelet transform) transforms the original image data block (e.g., block 1) into five coefficient blocks. FIG. 2 is a diagram representing these five coefficient blocks (labeled H, LH, LLH, LLLH, and LLLL in FIG. 2). Block H comprises coefficients that are indicative of those features the original image having the highest spatial frequencies, and block LLLL comprises coefficients that are indicative of those features of the original image having the lowest spatial frequencies. Block H is generated during the first (horizontal) wavelet transform, block LH is generated during the second (vertical) wavelet transform, block LLH is generated during the third (horizontal) wavelet transform, and blocks LLLH and LLLL are generated during the fourth (vertical) wavelet transform.
In order to perform a complete image compression operation on the original image data, the coefficients comprising the blocks of FIG. 2 are quantized and then subjected to entropy encoding (as noted above).
The number of conventional wavelet transform stages that are included in conventional image compression operation depends on the degree of compression that is desired, with more transform stages resulting in greater compression.
Conventional image compression, in which a multi-stage wavelet transform is performed on an Nxc3x97M block of image data (in which each of the xe2x80x9cNxe2x80x9d rows comprises xe2x80x9cMxe2x80x9d words and each of the xe2x80x9cMxe2x80x9d columns comprises xe2x80x9cNxe2x80x9d words), is expensive to implement since a large buffer memory is required to implement each of its vertical transform stages that follows a horizontal transform stage (and each of its horizontal transform stages that follows a vertical transform stage). If the first stage is a horizontal stage, the first stage produces two Nxc3x97M/2 blocks of coefficients, the second stage is a vertical stage which operates on columns (each comprising N words) of one of the Nxc3x97M/2 blocks, the third stage is a horizontal stage which operates on rows (each comprising M/2 words) of an N/2xc3x97M/2 block produced in the second stage, and so on. Thus, even if the vertical and horizontal stages are performed recursively (using one horizontal transform circuit, one vertical transform circuit, and a memory to which both transform circuits can write and from which both transform circuits can read), the memory must have the capacity to store at least an Nxc3x97M/2 block. In accordance with the present invention, image compression (in which a multi-stage wavelet transform is performed on an Nxc3x97M block of image data) can be performed using a smaller memory than would be required to implement an equivalent conventional image compression operation on the same input image data block.
An important aspect of the invention is a method and apparatus for performing a multi-stage wavelet transform on an Nxc3x97M block of image data, using a smaller memory than would be required to implement an equivalent conventional multi-stage wavelet transform on the same image data block. Another aspect of the invention is a method and apparatus for performing compression on such a block of image data, including by performing a multi-stage wavelet transform on the block, quantizing coefficients resulting from the multi-stage wavelet transform, and performing entropy encoding on the quantized coefficients.
In some preferred embodiments, the input image data is image data that has been generated by a document scanner, and the input image data is compressed in a manner allowing fast decompression (preferably with the inverse wavelet transform performed using integer operations), by a compression method that performs multi-stage wavelet transformation on the input image data (in a manner imposing low memory requirements). Preferably, fast decompression is made possible by employing simple entropy encoding (and optionally also by employing integer operations to perform the wavelet transform) in the compression operation. Preferably, the input image data is compressed with good quality, in the sense that the peak signal to noise ratio (xe2x80x9cPSNRxe2x80x9d) of the compressed data is at least substantially equal to 40 dB. In some implementations, each color component of the compressed data has a bit rate in the range from 2 to 3 bits per pixel (where each pixel of the input data comprises three 8-bit color component words).
Some embodiments of the inventive method for compressing image data include the steps of: operating a first circuit to perform at least two consecutive horizontal wavelet transform stages on a block of the image data, quantizing and entropy encoding at least a first block of the resulting coefficients which are indicative of relatively high spatial frequency information (xe2x80x9chigh frequency coefficientsxe2x80x9d) and writing to a memory a second block of the resulting coefficients which are indicative of relatively low spatial frequency information (xe2x80x9clow frequency coefficientsxe2x80x9d), reading columns of the low frequency coefficients from the memory and operating a second circuit to perform at least one vertical wavelet transform stage (or two or more consecutive vertical wavelet transform stages) on the low frequency coefficients read from the memory, and quantizing and entropy encoding at least some of the resulting coefficients.
The image data compression apparatus of the invention includes a random access memory (RAM), and a first circuit and a second circuit coupled to the memory. The first circuit is configured to perform at least two consecutive horizontal wavelet transform stages on a block of image data, quantize and entropy encode at least a first block of the resulting coefficients which are indicative of relatively high spatial frequency information (xe2x80x9chigh frequency coefficientsxe2x80x9d), and write to the memory a second block of the resulting coefficients which are indicative of relatively low spatial frequency information (xe2x80x9clow frequency coefficientsxe2x80x9d). In preferred embodiments, the second circuit is configured to read columns of the low frequency coefficients from the memory, to perform at least one vertical wavelet transform stage (i.e., one vertical wavelet transform stage, or two or more consecutive vertical wavelet transform stages) on the low frequency coefficients, and to quantize and entropy encode at least some of the resulting coefficients.
In other implementations of the inventive apparatus, the first and second circuits are configured to operate recursively on data (with any number of cycles), in the sense that one of the circuits (during each of the cycles) operates on a subset of the data generated by the other (after such other circuit has written the subset to the memory). For example, in some recursive implementations, the first circuit performs two consecutive horizontal wavelet transform stages (in a first cycle), the second circuit then (in a second cycle) performs at least one vertical wavelet transform stage (on data generated in the first cycle), and the first circuit then (in a third cycle) performs at least one horizontal wavelet transform stage (e.g., two consecutive horizontal wavelet transform stages) on data generated in the second cycle. In one class of embodiments, the second circuit performs at least one vertical wavelet transform stage on the low frequency coefficients, quantizes and entropy encodes at least a first set of the resulting coefficients which are indicative of relatively high spatial frequency information (regarding the low frequency coefficients), and writes to the memory a second set of the resulting coefficients which are indicative of relatively low spatial frequency information (regarding the low frequency coefficients). In the latter embodiments, the first circuit is configured to read from the memory (on a row by row basis) the second set of coefficients, to perform at least one horizontal wavelet transform stage on the second set of coefficients, and to quantize and entropy encode at least some of the resulting coefficients.
In some preferred embodiments of the inventive method and apparatus, each wavelet transform is a 3-5 wavelet transform.
As explained above, in the present disclosure, the terms xe2x80x9chorizontalxe2x80x9d and xe2x80x9cverticalxe2x80x9d are arbitrary in the sense that xe2x80x9chorizontalxe2x80x9d and xe2x80x9cverticalxe2x80x9d operations are performed respectively on xe2x80x9crowsxe2x80x9d and xe2x80x9ccolumnsxe2x80x9d of data, and first and second orthogonal dimensions of a block of data are arbitrarily denoted respectively as xe2x80x9crowsxe2x80x9d and xe2x80x9ccolumns or xe2x80x9ccolumnsxe2x80x9d and xe2x80x9crows.xe2x80x9d