1. Field of the Invention
This invention generally relates to the encoding of digital data for efficient transmission or storage. The encoding scheme may be used in conjunction with a printer or like device.
2. Description of the Prior Art
Data compression is commonly used in digital systems to reduce the quantity of bits required to represent digital data. Specifically, data compression may be employed to store digital data using less memory space, or to transmit digital data more quickly. Data compression is especially prevalent in the processing of digital images due the large volume of information involved.
Digital images typically consist of bilevel or multilevel data, or a combination of both. Each pixel value within an image containing bilevel data may be represented using one bit, which may assume a value of 1 or 0. For example, graphics, text, and half-tone images may be represented using bilevel data. Multilevel data (or gray-level data) consists of pixels which may assume more than two values. Accordingly, each gray-level pixel usually requires a plurality of bits for its representation. For example, three bits may be used to represent the gray level of a pixel. As used herein, the term "bilevel text" refers to text data and text-like data, such as graphical art. The term "bilevel image" refers to bilevel half-toned image data and half-tone-like image data.
A number of compression techniques exist in the art to condense images. Among the most simple technique for compressing bilevel images is known as White Block Skipping. In this technique, each image is broken into a series of blocks of M.times.N pixels. If all of the pixels are white, the block is coded by 0. If the block contains at least one black pixel, the block is coded by 1. In the former case, only a 0 is transmitted, whereas in the latter case, the entire block of values has to be transmitted. A somewhat more complex version of this scheme is discussed by M. Kunt in "Source coding of X-ray Pictures, " IEEE trans. on Biomedical Engineering, vol. BME-25, no. 2, March 1978, p. 124. Here, an M.times.N block of 0s is coded as a 0, while a block of 1s is coded as 11. Blocks comprising a combination of 1s and 0s are coded as 10, followed by the specific entries in the block.
Another common technique for compressing bilevel images is run length coding. In this technique, a string of 0s bracketed by two 1s is coded by specifying the length of the string of 0s, as opposed to transmitting every 0 bit. This technique is best suited for coding of bilevel text and graphics in which large runs of white space (coded as 0s) are expected. In such a circumstance, the probability of encountering a 0 is nearly unity.
Run length coding has been improved by another well-known technique, referred to as Huffman coding. In this technique, codes are assigned to different run lengths using a binary tree in such a manner that the most-frequently encountered run lengths are assigned the shortest codes. In practice, Huffman coding typically employs a look-up table storing the previously encountered run lengths and their corresponding codes. In the case of a match between a newly encountered run length and a previously encountered run length, the corresponding code is substituted for the run length in the output data stream. Further compression may be achieved by transmitting a pointer to the appropriate entry in the look-up table, instead of the Huffman code. This technique, referred to as Lempel-Ziv compression, is exemplified in U.S. Pat. No. 4,464,650.
In most instances, the above identified compression techniques are specifically tailored to process a particular type of image, such as a bilevel image. And in fact, these compression techniques perform well so long as they are fed images which they were designed to handle. Yet significant problems occur when a data compression technique designed to process a particular type of image is fed a "foreign" type of image. This problem may arise, for example, when a raw image document contains a mixture of different types of images.
For instance, if the image contains a mixture of bilevel text and multilevel image, the prior art may process the entire image as a multilevel image for sake of simplicity. The resultant coding may be needlessly lengthy, especially if significant portions of the image comprise bilevel information (which optimally may be represented using one bit for each pixel).
In the case of images containing a mixture of bilevel text and bilevel half-tone image data, applying a single compression algorithm may prove equally as inefficient. For instance, run length coding will typically provide efficient coding of the bilevel text portion, but not the half-tone portion. Run length coding is ill-equipped to handle the short run lengths prevalent in half-tone images. Huffman coding may also be inappropriate for compressing half-tone images. Notably, as discussed above, Huffman encoding employs a table containing entries representative of the most frequently encountered image data, sometimes derived from a series of test documents. When the encoding algorithm encounters a "foreign" image, the encoding may produce an output stream which actually contains more bits than the original raw image data. The document may in fact be expanded, rather than compressed.