The present invention relates to data compression, and more specifically to a two-dimensional compression method and system for compressing bi-level bit-mapped images.
Rasterization is the process of converting data, such as ASCII text, into what is known as a bit-mapped image, which is a sequential collection of bits representing an image to be displayed on a computer screen. Each bit in a bit-mapped image corresponds to one pixel location on the screen, and each horizontal line of bit values in a bit-mapped image is known as a rasterline. Output devices capable of reproducing a bit-mapped image, such as line printers and pen plotters for example, are known as bi-level devices because these devices are capable of producing only two levels of gray at a single pixel location; white (or light grey) to represent paper, and black (or dark grey) to represent ink on the paper. The bit values in bi-level bit-mapped images are either 0, to display white, or 1, to display black.
Bit-mapped images are stored in a contiguous piece of computer memory called a frame buffer. The frame buffer is an array containing one memory bit for each pixel that the raster device is capable of printing. For example, for a 1024.times.1024 pixel image, the frame buffer requires 1,048,576 bits of memory. Although the cost of memory has decreased, the cost of memory continues to add significantly to the total cost of an output processing device, such as a conventional printer for instance. A conventional printer is equipped with sufficient memory and processing power to rasterize incoming data within the printer. Because of the added memory and processing power, a conventional printer is generally expensive. By reducing the size of the frame buffer, however, the total cost of a conventional printer may also be reduced.
One method for reducing the size of the frame buffer is to compress an incoming bit-mapped image inside the printer before storing the image in the frame buffer. In this method, the incoming bit-mapped image is first compressed by what is known as an encoder, and then stored in a smaller version of the frame buffer. Before the data in the frame buffer is printed, the data is decompressed by what is known as a decoder. The encoder and decoder are either implemented in software or hardware within the printer. In prior methods where the printer includes a hardware implementation of the encoder/decoder, the cost of the encoder/decoder is generally the same as the frame buffer memory that is saved. Therefore, such methods fail to reduce the total cost of the printer.
Another type of printer, which is much less expensive than a conventional printer, is referred to as a "dumb" printer. In a dumb printer, most if not all, of the rasterization of an incoming image is performed by a host device, such as a personal computer (PC). After a bit-mapped image is rasterized by the PC, the PC sends the bit-mapped image to the dumb printer via the PC's parallel port, and the dumb printer is only responsible for printing the bit-mapped image. A dumb printer is less expensive than a conventional printer because of the savings realized from using less memory and a smaller microprocessor (if one is used at all).
Since one bit corresponding to each pixel location in a bit-mapped image must be sent from the PC to the printer, one major issue concerning the use of a dumb printer is throughput between the PC and the printer. Since a bit-mapped image is sent over a parallel port to the printer, the speed at which data is printed can be bound by the speed of input/output devices of the PC. One method for increasing the throughput between the PC and the printer is to compress the amount of data that must pass through the parallel port of the PC. This may be accomplished by using an encoder to compress bit-mapped images in the PC, rather than the printer, and using a hardware implemented decoder in the printer to decompress the images before the images are printed. In one method, the encoder is implemented as an add-on board that is inserted in the PC. This method has not been widely used due to the disadvantages associated with the use of add-on PC boards. Furthermore, the printer containing the decoder is incompatible with Pcs that do not contain the add-on encoder board.
Besides implementation issues associated with compression methods, the following aspects of compression methods must also be examined: the speed at which a compression method compresses data; the compression ratio, which is the size of the compressed data compared with the size of the original data; and the complexity of the compression method. Many well-known compression methods exist. However, each method usually performs well either with byte-oriented data, such as ASCII text, or with bit-mapped images.
For byte-oriented data, the Lempel-Ziv algorithms "LZ1" and "LZ2", and Huffman coding compression methods are widely used. The LZ1 and LZ2 algorithms assign fixed-length codes to variable size input strings. LZ1 and LZ2 are used in the Consultative Committee for International Telephone and Telegraph (CCITT) V.42 data compression standard for use in switched network modems. LZ1 and LZ2 are also used for data storage, i.e.,in tape drives and hard disk drives.
Huffman coding is another method for compressing data in which individual elements found in the data are assigned a code based on the relative frequency of the elements where the most frequently occurring elements are assigned a code with the smallest number of bits. Usually, Huffman coding is used to compress text, with the coding based on letter frequency. A drawback to Huffman coding is that it requires two passes of the data to generate statistics and to create a table containing the assigned codes (which is later used for decompression).
For bit-mapped images, compression methods known as Run-length coding, and CCITT G3 and G4 algorithms are widely used. Run-length coding is a compression scheme that has long been used for facsimile and photo transmission to reduce the amount of data in a bit-mapped image. Run-length coding eliminates repetitive sequences of equal pixel values in each horizontal rasterline by partitioning each rasterline into a series of runs of pixels that have the same values. When images are made up of a few long length runs, run-length encoding can substantially reduce the amount of memory needed to store images. However, as the average run length decreases, the image storage size increases rapidly.
Both CCITT G3 and G4 were designed mainly for telecommunication where CCITT G4 is a one-dimensional compression algorithm and CCITT G3 is a two-dimensional compression algorithm. In one-dimensional compression methods, each row of bit-mapped image is compressed independently, while in two-dimensional compression methods, each row of the bit-mapped image is compressed as a function of the data contained in adjacent rows. CCITT G3 and G4 provide adequate compression ratios for bit-mapped images, but because both G3 and G4 operate on individual bits rather than bytes, the CCITT G3 and G4 algorithms are in general, quite complex and slow.
Due to the disadvantages of the compression methods described above, a well known compression scheme for bit-mapped images was developed or specifically for printers. This compression scheme is referred to as delta-row compression. In delta-row compression, each rasterline in a bit-mapped image is compressed by identifying a section of bytes in a row that is different from the preceding row. The section of bytes that differ from the preceding row are called delta data. A rasterline is then decompressed by a printer by using the immediately preceding row, which is called the reference row, and the delta data. The reference row is changed as indicated by the delta data to recreate a new row. This new decompressed row is then printed and becomes the new reference row.
In delta-row compression, a compressed rasterline is output as a sequence of command bytes and the delta (replacement) bytes. Each command contains the following: 1) the number of bytes to replace in the reference row, 2) the relative offset from the last unchanged byte in the reference row where the replacement bytes are to be positioned, and 3) the replacement bytes themselves. The command byte typically consists of eight bits, 0-7, where the upper three bits identify the number of replacement bytes and the lower five bits identify the offset. For example, assume a command byte contains the following data: EQU 010 00111 11111111
The first three bits are the number of bytes to replace in the reference row (two), the next five bits indicate the offset (seven), and the following two bytes are replacement bytes. Thus, the replacement bytes will replace bytes 7 and 8 in the reference row when the new row is created.
Although delta-row compression is an improvement over other forms of compression when used with printer control language, delta-row compression uses a fixed format and therefore introduces unnecessary overhead. For example, in delta row compression, if a row is completely different from the reference row, then the entire row must be transmitted. Also, where the first byte of a row is different from the first byte of the reference row, the five-bit offset field is still used to indicate a relative offset of zero bytes. Furthermore, delta row compression uses a fixed bit field to represent the number of bytes to replace, and a separate replacement byte is included in the command even where the replacement bytes are identical.