1. Field of the Invention
The present invention relates generally to data processing and, more particularly, to data filtering and data compression for compound document pages including tristimulus spatial coordinate color image data.
2. Description of Related Art
Raster-based printers use a coding technique which codes each picture element, commonly called a xe2x80x9cpixel,xe2x80x9d of alphanumeric character text or a computer graphic into a digital data format. A xe2x80x9ccompound documentxe2x80x9d includes both text and graphics, for example, an advertising page having both text and photographs. Data compression is used to reduce a data set for storage and transfer. Compressed raster data is output by a computer for decompression and printing by a hard copy apparatus such as a laser printer or ink-jet printer, facsimile machine, or the like. Reductions in the amount of total data needed to transfer a complete page data set compensates for limitations in input/output (xe2x80x9cI/Oxe2x80x9d) data rates and I/O buffer sizes, particularly in a limited memory, hard copy apparatus that receives such raster-based data. With raster data, the goal is to reduce the quantity of data transferred without affecting the visual quality characteristics of the document page. The following descriptions assume knowledge of an average person skilled in the art of both raster-based printing and data compression techniques. As used herein the term xe2x80x9cimage dataxe2x80x9d refers to photographs or other digitally scanned, or otherwise produced, sophisticated graphics.
Computerized systems that utilize loss-less compression techniques generally do not perform well on image data. While computationally achieving a 100:1 compression on text and business graphics (line art, bar charts, and the like) data, these complex algorithms usually achieve less than a 2:1 compression of image data. As a corollary, while image data can be compressed effectively with a xe2x80x9clossyxe2x80x9d algorithm without significantly affecting perceptible image quality (e.g., the JPEG industry standard for photographsxe2x80x94having a disadvantage of being relatively slow in and of itself), data compression solutions that rely solely on lossy algorithms visibly degrade text data (such as by leaving visual artifacts), even at relatively low levels of compression. Moreover, lossy compression techniques do not achieve the desirable high compression ratios. Still further, the advantages of JPEG-like compression over other techniques are reduced when compressing image data that have been scaled using a pixel-replication scaling algorithm common to rasterized compound documents (e.g., 150 dot-per-inch (xe2x80x9cdpixe2x80x9d) image data scaled up to a resolution of 300-dpi or 600-dpi).
Solutions that use a mix of lossy and loss-less data compression are often slow and complex. For example, text and image data are sometimes separated to different channels, one containing the images using a lossy compression technique, like JPEG, and the other using a loss-less compression technique for text and simple business graphics. This separation of data into individual channels can be slow and the results are dependent on the architecture of the rasterization engine that initially rasterized the compound document. Moreover, the use of a lossy algorithm sometimes requires custom decompression hardware to achieve acceptable data processing speeds, which adds to the cost of a hard copy product. Again, the advantages of a JPEG-type algorithm are still reduced for images that have been scaled. Moreover, the relatively slow nature of JPEG is not improved even when compressing high resolution pixel replicated image data.
Thus, there is a need for a fast, raster-based, data compression technique for the transmission of compound documents, particulary useful for hard copy printing.
In its basic aspects, the present invention provides a method for filtering an image data subset of a page description data set, including the steps of: receiving a set of page description data including at least one image data subset; filtering image data of the image data subset by comparing adjacent pixels and coalescing adjacent pixels having substantially identical color values into pixel blocks wherein each of the pixel blocks is a plurality of pixels described by pixel block size, location in the image data subset, and an average of the substantially identical color values of the adjacent pixels.
In another basic aspect the present invention provides a method for filtering a data set of image raster data in the form of color space coordinate values for individual pixels, including the steps of: a) choosing a current pixel for filtering; b) comparing the current pixel to adjacent pixels; c) determining when adjacent pixels have a substantially identical color value; d) when the adjacent pixels do not have a substantially identical color value, choosing a new current pixel for filtering and returning to step b); e) when the adjacent pixels have a substantially identical color value, averaging the adjacent pixels and forming a pixel block therefrom having a single color space coordinate value therefor; f) comparing adjacent pixel blocks; g) when adjacent pixel blocks have a substantially identical color value, averaging the adjacent pixel blocks and forming a pixel super-block therefrom having a single color space coordinate value therefor; h) repeating steps b. through g. for the entire data set until either no substantially identical color value pixels or pixel blocks or pixel super-locks are adjacently located or until a predetermined size pixel block or super-block of a predetermined grid size of pixels is created; and i) when adjacent pixel blocks do not have a substantially identical color value, choosing a new current pixel for filtering and returning to step b. For each pixel block comparison in a current series of comparing steps, the difference error value is reduced based on predetermined parameters.
In another basic aspect, the present invention provides a computer algorithm for filtering an image data set, including the steps of: operating on a predetermined number of rows of pixels of said image data set by comparing and coalescing individual the pixels into rectangular blocks of pixels such that each of the rectangular blocks has a single color space coordinate identifier wherein block sizes of a programmable predetermined size block are constructed and each of the rectangular blocks is complete when a color difference error value between adjacent blocks exceeds a programmable, variable, predetermined threshold such that a filtered image data set is formed from rectangular blocks of pixels; and replacing the image data set with the filtered image data set.
In still another basic aspect, the present invention provides a data compression method for compound document data, including the steps of: receiving a set of page description data representing a compound document page; extracting image data from the set of page description data; filtering the image data and outputting a filtered image data set; restoring the filtered image data set to the set of page description data; rasterizing the set of page description data having the filtered image data set and outputting a set of rasterized page description data; and compressing the rasterized page description data and outputting a set of compressed rasterized page description. The image data is reduced from individual pixels to pixel blocks representing groups of adjacent pixels having substantially identical color values.
In a further basic aspect, the present invention provides a computer memory having an image data filtering program including: means for receiving a set of page description data representing a compound document page; means for extracting image data from the set of page description data; means for filtering the image data and outputting a filtered image data set; means for restoring the filtered image data set to the set of page description data; means for rasterizing the set of page description data having the filtered image data set; and means for outputting a set of rasterized page description data.
In yet another basic aspect, the present invention provides a computerized method for enhancing compressibility of a compound document single page data set, including the steps of: extracting pixel image data from the data set; filtering the pixel image data such that image regions of substantially the same color are in a compression enhanced format; recombining the image data set to form a data compressible enhanced format compound document single page data set; rasterizing the data compressible enhanced format compound document data set; and running a data compression process on the data compressible enhanced format compound document page data set. The step of filtering includes the steps of: comparing pairs of pixels; averaging representative color data of the pairs of pixels if respective pixel image data are close enough in value so as to minimally affect print quality such that pixel blocks are formed set to a single color value for enhancing compressibility; and averaging pixel blocks with neighboring blocks to create larger blocks until a predetermined super-block size is reached or until a color error tolerance is reached wherein as super-block area grows, the color error tolerance is reduced.
It is an advantage of the present invention that it provides data compression for documents with a mix of text, image data, and business graphics which can be compressed and decompressed quickly with high compression ratios.
It is an advantage of the present invention that it provides a near loss-less data compression and decompression.
It is an advantage of the present invention that it provides a data compression enhancement technique that can be tuned to trade image quality with compression ratio.
It is a further advantage of the present invention that it increases compression ratios for high resolution image data with substantially no perceptible image quality changes.
It is an advantage of the present invention that text and graphics portions of a compound document are compressed in a loss-less or near loss-less manner with high compression ratios.
It is a further advantage of the present invention that it is effective on images that have been scaled to a higher resolution through pixel replication.
It is another advantage of the present invention that no data separation between images and text or computer graphics is required during data compression and decompression.
It is another advantage of the present invention that it can be implemented in software.
It is yet another advantage of the present invention that software implementation enables faster implementation.
It is yet another advantage of the present invention that it has lower computational complexity which provides fast data compression and decompression.
It is another advantage of the present invention that it specifies an intermediate format which can convert from any host format to any format within a hard copy apparatus.