This invention relates generally to compression and decompression of data, and more particularly to determining different types of structures that may exist in rasterized data and selectively applying appropriate compression schemes thereto.
In a display-oriented environment, pictorial data is presented in a two-dimensional page representation. A page is typically composed by a user on a workstation with the aid of a desktop publishing application. The page may contain text, line art (also called "graphic") and image (e.g., photo) objects and is usually output by the desktop publishing application in the form of a page description file as specified by a page description language (PDL). Before a page can be rendered by a rendering device such as a printer or a display screen, the data must be presented to the rendering device in the form of a rasterized page. The conversion to a rasterized form is accomplished by a PDL interpreter specific to the PDL used.
A rasterized page is a digital representation of a page by means of a two-dimensional array of pixels, with each pixel assuming a particular color. The color has a range depending on the number of bits assigned to each pixel, with a larger number of bits producing a higher color resolution (color depth). In printer applications, it is expedient to classify the colors into four components corresponding to four basic inks: cyan (C), magenta (M), yellow (Y), and black (K). For example, commercial applications typically has a color resolution obtained from using 8 bits (byte) of storage assigned to each color component so that each pixel has 4 bytes associated with it. This will produce approximately 4 billion ink combinations.
Printers, particularly laser printers, typically have a print engine that prints at a constant rate. Raster data must be fed to the print engine at a rate commensurate with the output rate or else a printer overrun error will occur. At the very least, the print engine can not be made to wait for raster data in the course of outputting a page. Thus, to accommodate the incompatibility between input data rate and print engine output rate, a print buffer (also referred to as a frame buffer) is employed to accommodate at least one rasterized page at a time.
The two-dimensional nature of a rasterized page results in the memory needed to store the page increasing as the square of the resolution and/or the product of the linear dimensions of the page. For example, for a modest printer resolution, such as 400 dpi (dots per inches) (i.e., 157 dots per cm) as applied to a page 8.5 inches by 11 inches (i.e., 21.6.times.27.9 cm) in size, the memory required for a page amounts to as much as 60 Mbytes (megabytes). With the high cost of memory, this amount of memory could easily cost more than the sum of all other parts of a laser printer, and would not be commercially or economically viable.
One common solution to minimize the size of the print buffer is to compress the raster data before storing in it. Once one or more pages of compressed raster data have been stored, they can be decompressed at a controlled rate appropriate for the print engine.
U.S. Pat. No. 5,479,587 discloses a print buffer minimization method in which the raster data is compressed by trying different compression procedures with increasing compression ratios until the raster data is compressed sufficiently to fit in a given print buffer. Each time, a compression procedure with a higher compression ratio is selected from a predefined repertoire of such procedures, ranging from lossless ones such as run-length encoding to lossy ones. Generally, lossless encoding is efficient on text and line art data while lossy encoding is effective on image data. However, this method may produce poor print quality when the nature of the raster page calls for lossy compression in order to achieve a predetermined compression ratio. This is because only one of the selected compression procedure is summarily applied across each strip of the page and when the strip contains both image data as well as text or line art data, the lossy compression procedure will generally blur sharp lines that usually delineate text or line art data or may introduce undesirable artifacts.
European Patent Publication No. 0597571 discloses a method in which the types of objects in a page are first extracted and the boundary of each object determined before rasterization. Appropriate compression procedures are selectively applied to each type of objects. In this way, lossless compression procedures may optimally be applied to text or line art objects while lossy compression procedures may be applied to image objects. Essentially, the method operates at the display list level which is an intermediate form between the page description file and the rasterized page. Objects and their types are determined by parsing from the high-level, implicitly object-defining commands of the PDL in the display list. This requires knowledge of the particular brand and version of PDL commands as well as how to reconstruct a certain object from these implicit manifestations. In any case, it appears all but the simplest boundaries such as objects enclosed in rectangular blocks are practically determinable from such deciphering at the display list level.
In general, the display list is interpreted by a specific PDL interpreter to generate raster data in page representation. The interpretation process is likened to a "black box" in which the display list is input at one end and out comes the raster data at the other end. Once the data is rasterized, it is in the form of an array of pixels or a bit map, and there are no longer any explicit and well defined objects to which individual compression procedure can be applied.