The print data pipeline of a printer performs a number of operations upon print data which enters the pipeline in preparation for printing. These operations include: print data compression, print data decompression, color space conversion, and halftoning. The type of operation performed and the specific order in which the operations will be performed can vary depending upon the type of print data which enters the pipeline, the capabilities of the print engine, and the memory available in the printer. The types of print data which may enter the pipeline include: text, line art, images, and graphics. In prior art pipeline implementations, the various processing operations are performed by a processor under the control of firmware. Depending upon the type of print data entering the pipeline, a number of possible firmware routines are executed, as necessary, to complete the aforementioned operations.
The specifics of the print data compression operation performed depend upon the type of print data which enters the pipeline. For example, with certain types of print data, such as image print data, print data compression routines which result in some loss of information are acceptable. With these types of print data, the decrease in the quality of the printed output is not perceptible. Compression routines which result in a loss of information not perceptible in the printed output are referred to as "visually lossless" systems However for other types of print data, such as text and line art, it is important, for the quality of the printed output, that the print data compression routines employed do not result in the loss of information.
Data compression/decompression systems, which are known in the art, encode a stream of digital data signals into compressed digital code signals and decode the compressed digital code signals back into the original data. Data compression refers to any process that attempts to convert data in a given format into an alternative format requiring less space than the original. The objective of data compression systems is to effect a savings in the amount of storage required to hold a given body of digital information. When that digital information is a digital representation of an image or text, data compression systems are divided into two general types: lossy, and lossless.
The lossless systems have what is referred to as reciprocity. In order for the data compression system to posses the property of reciprocity it must be possible to re-expand or decode the compressed data back into its original form without any alteration or loss of information. The decoded and original data must be identical and indistinguishable with respect to each other. Thus, the property of reciprocity is synonymous to that of strict noiselessness used in information theory.
Some applications do not require strict adherence to the property of reciprocity. As stated above, one such application in particular, is when dealing with image data. Because the human eye is not sensitive to noise, some alteration or loss of information during the compression and decompression process is acceptable. This loss of information gives the lossy data compression systems their name.
An important criteria in the design of data compression systems is the compression effectiveness, which is characterized by the compression ratio. The compression ratio is the ratio of data size in uncompressed form divided by the size in compressed in form. In order for data to be compressible the data must contain redundancy. Compression effectiveness is determined by how effectively the compression procedure uses the redundancy in the input data. In typical computer stored data, redundancy occurs both in the non-uniform usage of individual symbology, for example digits, bytes, or characters, and in frequent reoccurrence of symbol sequences, such as common words, blank record fields and the like.
The data compression system should provide sufficient performance with respect to the data rates provided by and accepted by the printer. The rate at which data can be compressed is determined by the input data processing rate of the compression system. Sufficient performance is necessary to maintain the data rates achieved and prevent interruption of printing because processed data is not available. Thus, the data compression and decompression system must have enough data bandwidth so as not to adversely effect the overall system.
Typically, the performance of data compression and decompression systems is limited by the computations necessary to compress and decompress and the speed of the system components such as random access memory and the like, utilized to store statistical data and guide the compression process. This is particularly true when the compression and decompression systems are implemented in firmware, wherein firmware guides a general purpose type central processing unit to perform the data compression and decompression process. In such a system, performance for a compression device is characterized by the number of processor cycles required per input character during compression. The fewer the number of cycles, the higher the performance. The firmware solutions are limited by the speed of the firmware compression and decompression because firmware takes several central processor unit cycles to decompress each byte. Thus, the firmware process generally was tailored to decrease compression ratios in order to increase decompression speed.
General purpose data compression procedures are known in the prior art; three relevant procedures being the Huffman method, the Tunstall method and the Lempel-Ziv method. One of the first general purpose data compression procedures developed is the Huffman method. Briefly described, the Huffman method maps full length segments of symbols into variable length words. The Tunstall method, which maps variable length symbols into fixed length binary words, is complimentary to the Huffman procedure. Like the Huffman procedure, the Tunstall procedure requires a foreknowledge of the source data probabilities. Again this foreknowledge requirement can be satisfied to some degree by utilizing an adaptive version which accumulates the statistic strength processing of the data.
The Lempel-Ziv procedure maps variable length segments of symbols into variable length binary words. It is asymptotically optimal when there are no constraints on the input or output segments. In this procedure the input data string is parsed into adaptively grown segments. Each of the segments consists of an exact copy of an earlier portion of the input string suffixed by one new symbol from the input data. The copy which is to be made is the longest possible and is not constrained to coincide with an earlier parsed segment. The code word which represents the segment in the output contains information consisting of a pointer to where there earlier copy portions begin, the length of the code, and the new symbol. Additional teaching for the Lempel-Ziv data compression technique can be found in the U.S. Pat. No. 4,558,302 incorporated herein by reference.
While the aforementioned data compression procedures are good general purpose lossless procedures, some specific types of redundancy may be compressed using other methods. One such lossless method commonly known as run length encoding (RLE), is well suited for graphical data. With RLE, sequences of individual characters can be encoded as a count field plus an identifier of the repeated character. Typically, two characters are needed to mark each character run, so that this encoding would not be used for runs of two or fewer characters. However, when dealing with a graphical image represented in digital data form, there can be large runs of the same character in any given line making RLE an effective compression procedure for such information.
All of the aforementioned data compression procedures are highly dependent upon redundancy in the data to achieve significant compression ratios. One significant disadvantage with these procedures, is that with certain types of data, the compressed output may actually be larger than the input because input data lacks any specific redundancy. In the art of printing, such "incompressible" data is easily generated.
Certain types of images are classified as either "ordered dither" or "error diffused". An ordered dither image (also called "cluster") is a half-tone image that includes half-tone gray representations throughout the page. Such images generally reflect substantial data redundancy and lend themselves to lossless techniques of data encoding such as those described above. However, error diffused images (also called "dispersed") exhibit little redundancy in their data and require different methods of compression. Print data representing photographic images provides another example of low redundancy print data. As a result, the use of a single data compression scheme in a page printer no longer enables such a printer to handle image data. In U.S. Pat. No. 5,479,587 entitled "Page Printer Having Adaptive Data Compression For Memory Minimization", issued to Cambell et al., assigned to the same assignee as this application and incorporated herein by reference, a page printer steps through various compression techniques as outlined in an attempt to accommodate a limited memory size that is less than that required for a full page of printed data. In that application, when an image is unprintable because of memory low conditions, first a "mode-M" compression technique is used. Using this technique, an attempt is made to compress the block using run length encoding for each row and by encoding delta changes that occur from row to row within the block. If the "mode-M" compression technique is unsuccessful in providing enough of a compression ratio to allow printing of the page, a second attempt is made using an LZW type compression. Finally, if the LZW based compression technique is unsuccessful in obtaining a high enough compression ratio to allow printing of the page, a lossy compression procedure is used.
In the processing of raster print data, a variety of operations can be performed on the raster print data prior to generating the printed page. Such operations as data compression, color space conversion, and halftoning are included in the operations which may be performed prior to generating the printed page. It is frequently the case, that in the processing of raster print data various sections of the page would be optimally processed by employing different types of data compression, color space conversion, and halftoning operations. A reoccurring problem confronted in optimally processing raster print data has been the partitioning of the raster print data forming the page so that the various raster print data processing operations may be optimally performed on the appropriate sections of the page.
Consider, for example, the amount of memory required to store the raster print data corresponding to a page. As printers increase in the density of dot placement (dots per inch), add gray scale capability (using a number of bits per pixel to define a gray scale level), and include color printing capability (requiring additional bits per pixel over monochrome printing), the memory required to store the data used to print a page can reach thirty two times the memory required for monochrome printer of the same resolution. To allow the color printers to use a more reasonable memory size, data compression techniques are generally used to reduce the memory requirements. However, different types of raster print data are each optimally compressed using different compression techniques. For example, for raster print data corresponding to sections of the page containing images, the optimal combination of compression ratio and print quality is achieved by employing lossy compression techniques. However, for raster print data corresponding to sections of the page containing text the optimal combination of compression ratio and print quality is achieved by employing lossless compression techniques. A need exists for a data processing pipeline which will allow application of the optimal type of data processing operation to the processing of each data element.