1. Field of the Invention
The present invention relates generally to pattern matching based compression of digital document images and more specifically to a system for obtaining a minimal number of equivalence classes or model bitmaps for perceptually lossless bilevel compression.
2. Antecedent History
There is strong motivation for efficient compression of digital document images. A typical image scanned at 300 dots per inch generates over 1 megabyte of information if stored in raw format. If this image were transferred over a typical modem (56 Kbs), with no noise, it would take well over 2 minutes to transfer a single document page; a 30 page document would take at least an hour to transfer. In addition, a CD-ROM with capacity of roughly 600 Megabytes could store only 600 document pages. Because of the tremendous number of documents transmitted and stored on a daily basis, efficient modes of compression are essential.
Bilevel image compression methods have been typically classified as either lossless or lossy. In lossless compression, the compressed data stream has all the information necessary to reconstruct an exact copy of the original image. Lossless compression techniques typically utilize either run-length encoding or arithmetic encoding to remove redundancy from the data stream. CCITT Group 3 and CCITT Group 4 are among the most widespread compression techniques that use run-length encoding. Using the CCITT Group 4 compression standard, a typical 300 dpi image is compressed into roughly 75 Kbytes, with a compression ratio of approximately 14 to 1 when compared to the raw bilevel image.
More recently, JBIG (Joint Bi-Level Image Experts Group) was introduced as a standard for lossless bilevel image compression. The JBIG standard uses an approximate binary arithmetic coder, known as QM-coder. The improved compression ratio for JBIG images results from a good context model, (i.e., in non-progressive JBIG mode, a half-diameter of 10 previously visited pixels), coupled with an arithmetic encoder that can approximate the entropy bound, given the probability of each context.
When compared to a raw scanned file, the 14:1 compression ratio of CCITT 4 is impressive, but the ASCII text that comprises the scanned document can typically be compressed into a file of less than 5K bytes, which is a further 15:1 reduction in file size when compared to a typical 300 dpi bilevel, CCITT 4 document at 75K bytes.
For a multipage scanned document, where the byte cost of a given font bitmap is amortized against all its references in the document, an additional compression ratio of 15:1 is often still possible with respect to CCITT 4 using a pattern matching format, where each distinct model bitmap is stored or transmitted only once. This class of compression technique is also known as xe2x80x9cpattern matching based compressionxe2x80x9d. A thorough review of the development of pattern matching based compression, dating back to 1974, is found in the PhD thesis of Qin Zhang, entitled xe2x80x9cDocument Image Compression Via Pattern Matchingxe2x80x9d, Dartmouth University, 1997, incorporated herein by reference.
To optimize the compression rates achievable with pattern matching based compression, it is essential to develop a reliable method for finding a minimal set of bitmaps that represent the fonts of a scanned document.
For pattern matching based compression systems, the most effective compression rates, which closely resemble the rates obtained for the OCRed text version of the same document, are achieved in lossy mode. If the same underlying symbol in the prescanned document appears multiple times on the same page, almost inevitably, each such appearance in the digital image results in a distinct bitmap. When these distinct bitmaps are xe2x80x9ccollapsedxe2x80x9d to a single representative model bitmap, some loss of information inevitably occurs. Generally, lossy pattern matching compression methods have attempted to ensure that the bitmaps that are replaced are humanly indistinguishable from the original bitmaps and such methods are often referred to as perceptually lossless.
One of the fundamental problems in any pattern matching based compression system is the risk of mismatching one bitmap with another. As long as the modifications that are the result of lossy compression are restricted to noise removal, there is no perceptual degradation in the digital document, however, even a single false substitution in the image can result in perceptually noticeable changes in the document appearance. For perceptually lossless compression to be a realistic alternative to lossless compression, very precise shape distance measures need to be derived that accurately measure human perceptual differences between two distinct bitmaps.
Pattern matching based compression techniques heretofore assumed as input, bilevel images that had already been converted from gray level through some thresholding process. Such systems typically do not attempt to model or characterize the digitization process. It is important however, to characterize the thresholding process for several reasons. A fundamental reason to attempt to mathematically (or otherwise) model the digitization process is that this model can then be used to characterize how different bitmaps in the digitally scanned document may (or may not) be digitations of the same underlying prototype. Another important reason is that certain steps in processing the image, such as thresholding, can be optimized.
Image data generated by a scanner is often captured as gray level images, where each pixel has a value in the range of 0-255. The rate of compression of the image data has been determined to be a function of optimization of the gray level threshold value for conversion to a bilevel input. Pursuant to the invention, a gray level threshold is selected for converting the gray level input to a bilevel input which minimizes the number of equivalence classes or model bitmaps. The optimum threshold is that which minimizes the weak connectivity of the document image, wherein weak connectivity comprises a checkerboard pattern found in a 2xc3x972 array or neighborhood of pixels. The gray level threshold value which minimizes weak connectivity has been found to optimize the document compression rate.
A further aspect of the invention deals with a technique for determining the optimum gray level threshold for conversion from gray level to bilevel which involves only a single pass of the image rather than individually ascertaining weak connectivity for each gray level threshold value. The array of pixels of the document is traversed in a single path examining successive 2xc3x972 matrices or neighborhoods and incrementing a plus register for the gray level value at which a checkerboard pattern first appears and incrementing a minus register for the gray level value at which the checkerboard pattern no longer exists.
After the image has been completely traversed, the total number of weak connectivity checkerboards for each gray level threshold value is calculated, based upon the difference between the values stored in the plus register less the values stored in the minus register. The optimal threshold value for the document is then determined and selected as the gray level value having the minimal weak connectivity.
A further feature of the invention relates to determining a distance measure between two discrete shapes which is consistent with human shape perception and capable of modeling scanner digitization, where error is primarily introduced through image quantization. This improved distance measure comprises a variation of the standard Hausdorff measure by limiting such measure to a single quadrant, which results in a lower mismatch rate.
From the foregoing compendium, it will be appreciated that it is an aspect of the present invention to provide a perceptually lossless image compression system of the general character described which is not subject to the disadvantages of the antecedent history aforementioned.
A feature of the present invention is to provide a perceptually lossless image compression system of the general character described which minimizes the number of model bitmaps or equivalence classes through the selection of a gray level threshold for bilevel conversion of a gray level image.
A consideration of the present invention is to provide a perceptually lossless image compression system of the general character described which optimizes image compression through the selection of a gray level threshold value which results in the lowest weak connectivity in the document.
Another aspect of the present invention is to provide a perceptually lossless image compression system of the general character described which includes a technique for rapidly generating a weak connectivity threshold histogram of a document.
A further feature of the present invention is to provide a perceptually lossless image compression system of the general character described which includes a technique for comparison of glyphs which accepts Hausdorff measure deviation within but a single quadrant.