Conventional data compression methods analyze pixels of a file to determine which glyphs composed of sets of the turned on pixels (e.g., black pixels) are matches. They then proceed to associate the matched glyphs (alternatively referred to as nodes, connected components, cc's, or bitmaps) with a common model. When comparing two matching functions (also referred to herein as “matchers”), one can be described as a tighter matcher if it is stricter with respect to which fonts it allows to be matched, while the other can be described as looser if it is more lenient in this regard. For example, a Hausdorff matcher requires that every black pixel (e.g., position {x, y}) on a first bitmap must find a corresponding black pixel on a second bitmap to which the first is compared within a pixel distance of one. A Rank 95 Hausdorff requires that 95% of the black pixels find a corresponding black pixel within a distance of one. A Rank 95 Hausdorff matcher is therefore looser than a Hausdorff matcher. A Quadrant Hausdorff matcher requires that every black pixel finds a corresponding black pixel within distance one in the same quadrant direction. (See U.S. Pat. No. 6,748,115 FIGS. 9 & 10 for an illustration.) Every black pixel in the first bitmap which does not find a matching pixel in the same exact position of the corresponding bitmap, must find a matching pixel in the same quadrant. For example, if all such pixels find a match in the top right quadrant, the bitmaps match. However, if there exists a pixel which needs to find support in a different quadrant such as the bottom left, the match would not be allowed. A Quadrant Hausdorff matcher is tighter than a Hausdorff matcher, as it imposes an additional directional constraint. Applying a looser matcher may result in fewer models overall, but increase the likelihood of a mismatch; applying a tighter matcher may result in more font models overall, but decrease the likelihood of a mismatch.
Most JBIG2 implementations will generally use a range of matchers, depending on the properties of the bitmaps being compared. These properties may include the height, width, area, the number of holes in the bitmap, and mean stroke thickness of the bitmaps. If the bitmaps are large and have a wide mean stroke thickness, they can generally be safely matched with a loose matcher such as a Rank 95 Hausdorff. If the bitmaps are smaller and thinner, they will be more likely to need a tighter matcher such as a Quadrant Hausdorff.
However, a general tradeoff in the field of data compression is rate vs. distortion, i.e., the higher the compression rate, the greater the amount of distortion. In fact, the tradeoff, referred to herein as “rate distortion theory,” is a major branch of information theory, the problem of determining the minimal amount of entropy (information) R that should be communicated over a channel, so that the source (input signal) can be approximately reconstructed at the receiver (output signal) without exceeding a given distortion D.
Rate distortion theory, created by Claude Shannon in his foundational work on information theory, gives theoretical bounds for how much compression can be achieved using lossy data compression methods. Many of the existing audio, speech, image, and video compression techniques have transforms, quantization, and bit-rate allocation procedures that capitalize on the general curve of the rate-distortion functions.
In rate distortion theory, the rate is the number of bits per data sample to be stored or transmitted. The notion of distortion is a subject of on-going discussion. In the most simple case (which is actually used in most cases), the distortion is defined as the variance or the means squared error of the difference between input and output signal. However, since most lossy compression techniques operate on data, e.g., music, pictures, video, that will be perceived by humans, the distortion measure preferably should include some aspects of human perception. Audio compression perceptual models, and therefore perceptual distortion measures, are relatively well developed and routinely used in compression techniques such as MP3, but are often not easy to include in rate distortion theory, i.e., calculation of the degree of distortion is difficult when perception models are used. In image and video compression, the human perception models are less well developed and inclusion is mostly limited to the JPEG and MPEG weighting (quantization) matrices.
Shannon's rate distortion theory notwithstanding, there does not seem to be an inherent tradeoff between rate and distortion. For example, in the lossless data compression domain, studies have shown that human entropy for English language text is about 1 bit per character (bpc), (i.e., a probability of a human to correctly guess a next character corresponds to a probability for which only 1 bit per character would be required for the encoding.) Traditional lossless text compression based on Lempel-Ziv methods (e.g., zip) has a compression rate of about 2 bpc. Newer lossless text compression methods that utilize techniques including arithmetic encoding and Markov models (of order statistics), such as PPMD, achieve a compression rate of approximately 1.5 bpc, which is closer to, but still higher than, human entropy rates of roughly 1.1 bpc. These improvements in lossless text compression rates, achieved over a 20 year period, did not come at a cost of greater data distortion since both zip and prediction-by-partial-matching (ppm) methods are entirely lossless. To the contrary, PPMD is a better model for text than the traditional lossless text compression methods, and better modeling yields a better representation or understanding of the data, and consequently, achieves lower entropy.
Modeling human perception is a very effective tool for efficient data compression methods. Although modeling human perception has been very important in areas like computer vision for a long time, it has generally played a very tangential role in data compression. Human perception models have evolved over time and seem to be highly effective. Rather than perceive things at a sensor level, e.g., pixels, perception is done at an object level. This has the positive effect of separating out signal from noise, as well as greatly reducing the amount of information that needs to be retained.
For example, a color image scan at 300 dots per inch (dpi) might typically involve 300*300 (pixels per square inch) *8.5*11 (paper size in inches) *24 (bits per pixels) equal to approximately 202 million bits of information (202 Mb). A standard compression method would involve finding ways to save this information, i.e., 202 million pieces of information, with a minimal file size and minimal distortion (variance) between the input and output signals. Nowhere in typical image compression algorithms is “scene understanding” an essential, or even important, component. Human perception works very differently: first there is an understanding of the general scene (e.g., indoor, outdoor, document, invoice document, etc.). Once the scene is understood on some basic level (e.g., frame instantiation as defined by Patrick Winston), the image is grouped or segmented into objects, of which there are very few compared to the number of pixels and corresponding bits. For example, if a color image has over 200 million bits of information, a human may typically perceive many fewer objects, perhaps less than one hundred. For a bitonal image scan, a typical compression algorithm (e.g., CCITT4) views the problem as storing 8 million pieces of data (pixels or bits), while a system using human perception models, referred to herein as a human system, perceives the problem as understanding 500-2000 character symbols or connected components.
Perceptually lossless image compression is about image understanding. The fundamental tenet in perceptually lossless image compression is to model human perception as closely as possible. Human perception models for speech, image, and video are typically much more advanced than standard computer models used in compression and other fields that do not attempt to model human perception. As such, perceptually lossless compression can achieve much lower rates of compression, ideally with NO perceptual distortion. On the other hand, a disadvantage of such perceptual methods is that they are domain specific, so that different techniques are used, for example, in image compression, speech compression, and video compression.