Data compression is required in data handling processes, where too much data is present for practical applications using the data. Commonly, compression is used in communication links, where the time to transmit is long, or where bandwidth is limited. Another use for compression is in data storage, where the amount of media space on which the data is stored can be substantially reduced with compression. Yet another application is a digital copier where an intermediate storage for collation, reprint or any other digital copier functions. Generally speaking, scanned images, i.e., electronic representations of hard copy documents, are commonly large, and thus are desirable candidates for compression.
Many different compression techniques exist, and many are proprietary to individual users. However, standards are desirable whenever intercommunication between devices will be practiced. Particularly with the advent of multimedia communication, where formerly dissimilar devices are required to communicate, a common standard will be required. An example is the current desirability of FAX machines to be able to communicate with printers. Currently, compression standards are generally distinct for different devices.
Three major schemes for image compression are currently being studied by international standardization groups. A first, for facsimile type image transmission, which is primarily binary, is under study by the JBIG (Joint Binary Image Group) committee, a second for TV and film, a standard is worked on by the MPEG (Motion Pictures Expert Group). For non-moving general images, i.e., still images which are more general than the ones covered by JBIG, the group JPEG (Joint Photographic Expert Group) is seeking to develop a device independent compression standard, using an adaptive discrete cosine transform scheme.
ADCT (Adaptive Discrete Cosine Transform, described for example, by W. H. Chen and C. H. Smith, in "Adaptive Coding of Monochrome and Color Images", IEEE Trans. Comm., Vol. COM-25, pp. 1285-1292, November 1977), as the method disseminated by the JPEG committee will be called in this application, is a lossy system which reduces data redundancies based on pixel to pixel correlations. Generally, in images, on a pixel to pixel basis, an image does not change very much. An image therefore has what is known as "natural spatial correlation". In natural scenes, correlation is generalized, but not exact. Noise makes each pixel somewhat different from its neighbors.
Generally, as shown in FIG. 1, the process of compression requires a tile memory 10 storing an M.times.M tile of the image. We will use square tiles in the description based on the JPEG recommendations, but it has to be noted that the inventive method can be performed with any form of tiling. From the portion of the image stored in tile memory, the discrete cosine transform (DCT), a frequency space representation of the image is formed at transformer 12. Hardware implementations are available, such as the C-Cube Microsystems CLS50A JPEG image compression processor, which operates in either the compression or the decompression mode according to the proposed JPEG standard. A divisor/quantization device 14 is used, from a set of values referred to as a Q-Table, stored in a Q table memory 16, so that a distinct Q table value is divided into the DCT value, returning the integer portion of the value as the quantized DCT value. A Huffman encoder 20 using statistical encoding the quantized DCT values to generate the compressed image that is output for storage, transmission, etc.
The current ADCT compression method divides an image into M .times.M pixel blocks, where M=8. The selection of M=8 is a compromise, where the larger the block given, the higher the compression ratio obtainable. However, such a larger block is also more likely to have non-correlated pixels within the block, thereby reducing the compression ratio. If the block was smaller, greater correlation within the block might be achieved, but less overall compression would be achieved. Particularly within a document image, edges of the image are more likely to be encountered within an 8.times.8 block, than would be the case for a scene forming a natural image. Thus, the assumption of spatial correlation fails to some extent. A major problem addressed by the present invention, and as will become more apparent hereinbelow, is that the assumptions of the ADCT proposal work well for photographs containing continuous tones and many levels of gray pixels, but often work poorly for the reproduction of document images, which have significant high frequency components and many high contrast edges.
Compression schemes tend to use a set of basis functions to utilize the intra block correlations. Basis functions define the data as a projection onto a set of orthogonal functions on an interval. ADCT uses cosine functions as the basis functions and the Discrete Cosine Transform (DCT) as the projection step. In the first step of the ADCT standard, the image is tiled into 8.times.8 blocks. Within each block, a set of 64 DCT coefficients is determined for the pixels in the block. The DCT coefficients represent the coefficients of each cosine term of the discrete cosine transform of the 8.times.8 block.
Referring now to FIG. 2A, an array of 64 gray level values representing 64 pixels in an 8.times.8 block of the image is shown. This 8.times.8 block is transformed according to the JPEG ADCT specifications giving the DCT coefficients shown in FIG. 2B. These coefficients still completely describe the image data of FIG. 2A, but in general larger values will now cluster at the top left corner in the low spatial frequency region. Simultaneously, in the vast majority of images as the frequency of the image increases, the coefficient values in the lower right hand portion of the grid tend towards zero.
Generally, the human eye tends to see low frequencies in an image best. At higher frequencies, changes from amplitude to amplitude are unnoticeable, unless such changes occur at extremely high contrast. This is a well known effect of the Human Visual System and extensively documented, see e.g. "Visual Performance and Image Coding" by P. Roetling, Proceedings of the S.I.D. 17/2 pp. 111-114 (1976). The ADCT method makes use of the fact that small amplitude changes at high frequencies can be generally ignored.
The next step in the ADCT method involves the use of a quantization or Q-matrix. The Q-matrix shown in FIG. 2C is a standard JPEG-suggested matrix for compression, but ADCT as well as the proposed inventive method can also operate using other Q-matrices (or Q-Tables). The Q-table values may be multiplied uniformly by a factor selected to increase or decrease compression. The matrix incorporates the effect that lower frequencies are roughly more important than high frequencies by introducing generally larger quantization steps, i.e. larger entries, for larger frequencies. However, the table also attempts to internally construct some desirable variations from the general assumption. Accordingly, the values in the table do vary with frequency, where the exact variation might be a function of the human visual system, of the document type expected, i.e.: photo, text, graphic, etc., or of some other application dependent parameter. Each of the DCT values from FIG. 2B is divided by a corresponding Q-matrix value from FIG. 2C giving quantized DCT (QDCT) values by way of: EQU QDCT[m][n]=INT{DCT[m][n].div.Q-Table[m][n]+1/2}
where INT{A} denotes the integer part of A
The remainder from the division process is discarded, resulting in a loss of data. Here and in the following we use the term division to describe the process detailed in ADCT including the methods for handling round-off. Furthermore, since the Q values in the lower right hand portion of the table tend to be high, most of the values in that area go to zero, unless there were extremely high amplitudes of the image at the higher frequencies.
After deriving the quantized set of DCT values, shown in FIG. 2D, pixels are arranged in the order of a space filling zigzag curve and a statistical encoding method, such as the Huffman process, is used to generate the transmitted signal. This statistical coding is performed in a lossless way and the only loss introduced in the compression is the one generated by the quantization of the DCT coefficients using the Q-Table.
ADCT transforms are well known, and hardware exists to perform the transform on image data, e.g., U.S. Pat. No. 5,049,991 to Nihara, U.S. Pat. No. 5,001,559 to Gonzales et al., and U.S. Pat. No. 4,999,705 to Puri. The primary thrust of these particular patents, however, is natural picture images, and not document images.
To decompress the now-compressed image, and with reference to FIG. 1, a series of functions or steps are followed to reverse of the process described. The Huffman encoding is removed at decoder 50. The image signal now represents the quantized DCT coefficients, which are multiplied at signal multiplier 52 by the Q table values in memory 54 in a process inverse to the compression process. At inverse transformer 56, the inverse transform of the discrete cosine transform is derived, and the output image in the spatial domain is stored at image buffer 58.
In the described decompression method, Huffman encoding is removed to obtain the quantized DCT coefficient set. Each member of the set is multiplied by a Q-Table value resulting in the DCT coefficients shown in FIG. 3A by using the data of FIG. 2C and 2D by ways of: EQU DCT[m][n]=QDCT[m][n].times.Q-Table[m][n].
However, the result shown in FIG. 3A is not the original set of DCT coefficients shown in FIG. 2B, because the remainders calculated for the original quantization of the DCT coefficients with the Q-Table in the compression process have been lost. In a standard ADCT decompression process, the inverse discrete cosine transform of the set of DCT coefficients is derived to obtain image values shown in FIG. 3B. Comparison of FIG. 3B with FIG. 2A show the difference.
The described process does not work to reproduce the best images. Clearly, it cannot reproduce the original image, since data within the image was discarded in the compression-quantization step. Failures are noted wherever strong edges, commonly present in text, appear. Particularly, at such edges "ringing artifacts" or in some references, "mosquito noise" is noted. These problems occur in text, graphics and halftones, components very common in document images. In addition to mosquito noise or ringing artifacts, a blocking artifact often appears associated with image areas with slowing varying grays, where each M.times.M block which formed the calculation of the compression basis appears visible. In either case, a problem has occurred.
In order to remove the artifacts noted, two methods of attacking the problem have been attempted. In a first method, the decompressed image is post processed, i.e., after the image has been fully decompressed, an attempt is made to improve the image. Of course, such processing can never go back to the original image, because that image has been lost. Such processes are demonstrated in the article "Reduction of Blocking Effects in Image Coding" by Reeve, III et al., Optical Engineering, January/February, 1984, Vol. 23, No. 1, p. 34, and "Linear Filtering for Reducing Blocking Effects in Orthogonal Transform Image Coding", by C. Avril et al., Journal of Electronic Imaging, April 1992, Vol. 1(2), pp. 183-191. However, this post-processing of the image leads to a reconstruction that could not have been the real source image and subsequent compression/decompression steps as they are possible in electronic imaging applications will lead to potentially larger and larger deviations between reconstruction and original.
Another approach to the problem is through an iterative decoding process using the known bandlimit of the data. In this method, using the compressed form of the image again, different blocks, perhaps 32 .times.32, are used to decode the image. In one example, "Iterative Procedures for Reduction of Blocking Effects in Transform Image Coding", by R. Rosenholtz et al., SPIE, Vol. 1452, Image Processing Algorithms and Techniques II, (1991), pp. 116-126, a method of blurring the overall image was considered, with the hope that such blurring would tend to smooth out the block artifacts noted above.
U.S. patent application Ser. No. 07/956,128 to Eschbach (presented at the Annual Conference of the German Society for Applied Optics, 1993, Wetzlar FGR) provided a method of improving the appearance of a decompressed document image while maintaining fidelity with an original document image from which it is derived, wherein for compression, an original document image is divided into blocks of pixels, the blocks of pixels are changed into blocks of transform coefficients by a forward transform coding operation using a frequency space transform compression operation, the transform coefficients subsequently quantized with a lossy quantization process in which each transform coefficient is divided by a quantizing value from a quantization table and the integer portion of a result is used as a quantized transform coefficient, and the blocks of quantized transform coefficients are encoded with a lossless encoding method, the method including the decompression steps of: a) receiving the encoded quantized transform coefficient blocks for the original image; b) removing any lossless encoding of the quantized transform coefficient blocks for the original image; c) multiplying each quantized transform coefficient in a block by a corresponding quantizing value from the quantization table to obtain a block of received transform coefficients; d) recovering the image by applying an inverse transform operation to the received transform coefficients; e) with a selected filter, reducing high frequency noise appearing in the recovered image as a result of the lossy quantization process, while preserving edges, whereby the appearance of the recovered image is rendered more visually appealing; f) changing the filtered recovered image into blocks of new transform coefficients by the forward transform coding operation using the frequency space transform operation; g) comparing each block of new transform coefficients to a corresponding block of received transform coefficients and the selected quantization table, to determine whether the filtered recovered image is derivable from the original image; and h) upon the determination transferring the filtered recovered image to an output. Step g) may include the additional steps of: 1) determining that the block of new transform coefficients is not derivable from the original image; and 2) altering individual new transform coefficients, so that a block of altered new transform coefficients is derivable from the original image, 3) recovering the image from the blocks of altered new transform coefficients. FIG. 4 provides a functional block diagram of the method described in that reference.
Experience with the above method taught that the process of DCT coefficient adjustment was computationally expensive, and still produced some ringing artifacts.
All of the references cited herein above are incorporated by reference for their teachings.