1. Field of the Invention
The invention relates generally to the manipulation of compressed data. More specifically, the invention relates to apparatus and methods for encoding independently decodable pieces of compressed data to accommodate reordering of the pieces before or after such manipulation. Embodiments of the present invention are thought to be particularly advantageous for high speed printing applications where the compressed data stream represents images and there are requirements to minimize storage, allow parallel decoding, and achieve high throughput.
2. Description of the Related Art
The purpose of data compression is to represent source data with less data in order to save storage costs or transmission time and costs. Data compression is regarded as “lossless” if the reconstructed data matches the source data. However, in applications where some loss is acceptable, such as pictures, approximating the source data with the reconstructed data, rather than reproducing it exactly achieves a more effective compression. This is called “lossy” compression.
The two basic components of data compression systems are the encoder and the decoder. The encoder compresses the source data (the original digital information) and generates a compressed data stream. The compressed data stream may be either stored or transmitted, but at some point is fed to the decoder. The decoder recreates or reconstructs the data from the compressed data stream. In general, a data compression encoding system can be broken into two basic parts: an encoder model and an entropy encoder. (Some like to have a third part, e.g., a statistical model, which, for simplicity, is currently included in the entropy encoder.) The encoder model generates a sequence of descriptors that is an abstract representation of the data. The entropy encoder takes these descriptors, converts them into symbols, and compresses the symbols taking advantage of the statistics to form compressed data. Similarly to the encoder, the decoder can be broken into basic parts that have an inverse function relative to the parts of the encoder.
Generally, for lossy compression, it is the encoder model and the decoder model that introduce the loss. Generally, the entropy encoding and decoding are lossless. In such cases, lossless transcoding becomes possible. The compressed data is entropy decoded to intermediate data, which may or may not be identical to the descriptors, and then the intermediate data can be fed into a different entropy encoder to create a different compressed data steam. Examples of entropy encoders are Huffman encoders and arithmetic encoders. Converting between these methods is an example of transcoding. For a more detailed discussion of Huffman variable-length coding see: M. Rabbani and P. W. Jones, Digital Image Compression Techniques, Tutorial Texts in Optical Engineering, Vol. TT7, SPIE Optical Engineering Press, Bellingham, Wash. 1991; and W. B. Pennebaker and J. L. Mitchell, JPEG: Still Image Data Compression Standard, Van Nostrand Reinhold, N.Y. (1993).
Without loss of generality, most examples herein will be presented with respect to images as the source data. In the context of high speed printing, data compression is key to lessening the amount of data transmission. As a practical matter, large-scale digital color printing has been unaffordable until recently for most applications. Consequently, the field of image processing has not yet had to address many of the problems associated with efficiently processing the absolute torrent of data required by continuous-tone color images.
An image, in the context of this application, is an electronic representation of a picture as an array of raster data. Image data can be generated by a computer program, or formed by electronically scanning such items as illustrations, drawings, photographs, and signatures. For monochrome images each sample in the array of raster data has an intensity value. For traditional facsimile images only two values are allowed (black or white) and so 1-bit per sample is sufficient. For continuous-tone pictures 8-bits per sample is more common. Some applications such as medical images require higher precision and use 12 bits per sample. Color images use three or four values for each position in the raster array. Typically scanners and displays use red, green, and blue (RGB) as primary colors or components. Printers often use cyan, magenta, and yellow (CMY) inks or toners. Sometimes a fourth black ink or toner is added (CMYK). The components may be interleaved (RGB,RGB,RGB) or separated into multiple arrays (RRR,BBB,GGG).
The size of digital images is rapidly increasing. In the late 1970s, 8.5 inch×11-inch facsimile images were standardized at 1728 picture elements (pels) per line by about 1100 lines (nominally 200 pels/inch by 100 lines/inch). A finer resolution of 200 lines/inch was also allowed. Now, many digital printers print binary images at 600 pels/inch by 600 lines/inch, an order of magnitude increase in data. Some of today's printers are supporting 1200 pels/inch by 1200 lines/inch.
Data compression was essential to making digital facsimile practical. Two international facsimile data compression standards were developed. The CCITT Group 3 (G3) facsimile machines were made for the public phone network and originally expected errors during transmission. Group 4 (G4) facsimile machines were intended for data networks and assumed that transmissions were error-free.
The G3 standard Modified Huffinan (MH) algorithm coded each binary line one-dimensionally (i.e., independently) with unique end-of-line (EOL) codes separating each line. Transmission errors were detected when the decoded lines did not match the expected line length. Since the compression process could be restarted on the next line by searching for the EOL, only one line was corrupted. The G3 Modified READ (MR) algorithm encoded some lines two-dimensionally (i.e. with reference to the previous line) and could not be restarted except at the one-dimensional lines. A tag bit after the EOL code indicated whether the next line was coded with one-dimensionally or two-dimensionally. The G3 standard required that at least every other line be coded with MH (and thus restartable after errors) for the standard resolution and at least every fourth line be coded with MH for the finer resolution. In the absence of transmission errors, the G3 algorithms are lossless (i.e., the decoder's output image exactly matches the encoder's input image).
G4 machines used the Modified Modified READ (MMR) data compression algorithm. MMR assumed an all white history line before the top image line and encoded all lines two-dimensionally (i.e., with reference to the history line) using the same two-dimensional codes as MR. The EOL codes were not used because transmission was assumed to be error-free. The G4 algorithm was soon extensively used in the error-free computer environments too.
IBM MMR was derived from the G3 MR algorithm before the G4 MMR standard was established. It is defined in K. L. Anderson, F. C. Mintzer, G. Goertzel, J. L. Mitchell, K. S. Pennington, and W. B. Pennebaker, “Binary Image Manipulation Algorithms in the Image View Facility,” IBM J. Res. Develop., vol. 31, 16–31 (1987). The first line is coded exactly like the first line of G3 MR starting the compressed data with an EOL with a 1-D tag. The first line is 1-D coded without the need for a history line. An EOL with a tag follows the first compressed line of data. If the tag indicates that 2-D coding follows, an arbitrary number of lines are encoded the same as G4 MMR. EOLs with 1-D or 2-D tags are allowed to be encoded at the start of any line. After the 1-D tag, the line is compressed with MH and must be followed by another EOL. The IBM MMR compressed data terminates with six EOLs with 1-D tags just like G3 MR.
Detailed examples of such data compression and decompression algorithms are given in: J. L. Mitchell, K. L. Anderson, and G. Goertzel, “Method for Encoding and Decoding a Digital Image,” U.S. Pat. No. 4,725,815 issued Feb. 16, 1988; and J. L. Mitchell, K. L. Anderson, and G. Goertzel, “Method for Encoding and Decoding a Digital Image,” U.S. Pat. No. 4,888,645 issued Dec. 19, 1989. The use of intermediate data called “run ends” is disclosed in the above. Rather than the individual bits representing the source image, the encoder model converts each raster line into a sequence of numbers that represent the distance from the left edge of the last pel in each run. An example of raster to run end conversion is given in: K. L. Anderson, G. Goertzel, and J. L. Mitchell, “A Method for Converting a Bit Map of an Image to a Run Length or Run End Representation,” U.S. Pat. No. 4,610,027 issued Sep. 2, 1986. The entropy encoder takes the run ends and converts them into Huffinan codes according to the appropriate standard.
Adaptive Bilevel Image Compression (ABIC) is a lossless binary arithmetic coding algorithm. For more details, refer to R. B. Arps, T. K. Truong, D. J. Lu, R. C. Pasco, and T. D. Friedman, “A multi-purpose VLSI chip for adaptive data compression of bilevel images,” IBM J. Res. Develop., Vol. 32, 775–795 (1988) and to G. G. Langdon Jr., J. L. Mitchell, W. B. Pennebaker, and J. J. Rissanen, entitled “Arithmetic Coding Encoder and Decoder System”, U.S. Pat. No. 4,905,297 issued Feb. 27, 1990.
ABIC encodes an image as one compressed data stream. Unlike the G3 MH, G3 MR, or the Joint Bi-Level Image Experts Group (JBIG) encoding algorithm, an ABIC compressed image has no markers or EOL codes. Decoding must be sequentially from the upper left pel to the bottom right pel in raster scan order.
The Joint Bi-Level Image Experts Group (JBIG) standardized the next generation facsimile data compression technique based on arithmetic coding. JBIG-1 allows for byte-aligned markers to separate the compressed data and identify strips of predetermined number of lines. The SDNRM marker (0xFF02) restarts the arithmetic coder A-register and C-register for the next stripe but keeps the probability estimates and uses the previous line(s) as history. The compressed data also starts byte-aligned. The SDRST marker (0xFF03) starts coding as if this next line were a new image. Thus, the SDRST identifies an independently decodable piece of compressed data that follows. The SDRST is required at the end of bit planes. This allows the compressed data to be shuffled without decoding to convert between stripes organized by bit plane and stripes organized by full width rectangles of multiple-bit image. Also the markers allow shuffling of the progressively encoded JBIG-1 image to shuffle the multiple resolutions of compressed data organized by resolution layer to compressed data organized by full-width regions of the image. Stripes must occur at a fixed number of complete lines except at the bottom of the image. Consequently, there is no ability to reenter the compressed data stream within a line. Stripe boundaries are selected at the encoder and are maintained for all resolution levels.
Compression of continuous-tone color images can be lossless, but the 24-(three 8-bit component colors) to 32-(four 8-bit component colors) fold increase in the data makes lossy compression often more practical. The JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts Group) standards are examples of lossy data compression standards. See generally: W. B. Pennebaker et. al.; J. L. Mitchell, W. B. Pennebaker, C. E. Fogg, and D. J. LeGall, MPEG Video Compression Standard, Chapman & Hall, N.Y. (1997); B. G. Haskell, A. Puri, and A. Netravali, Digital Video Compression Standard, An Introduction to MPEG-2, Mitchell & Pennebaker, Editors, Chapman & Hall, N.Y. (1997); and K. R. Rao and J. J. Hwang, Techniques and Standards for Image, Video, and Audio Coding, Prentice Hall PTR, Upper Saddle River, N.J. (1996).
Both the MPEG and JPEG standards employ transform coding. Each color component is divided up into 8×8 blocks. The forward Discrete Cosine Transform (FDCT) of each block is performed. The transform coefficients are then quantized. This step introduces the largest loss as the quantized coefficients are rounded to integers. Then a lossless entropy coding technique (Huffman coding for MPEG and baseline JPEG or also arithmetic coding for JPEG) encodes the quantized integers. The decoder performs the entropy decoding to recover the quantized coefficients. Then an inverse quantization (or dequantization) step multiplies each coefficient by its quantization. The dequantized coefficients are fed to an inverse Discrete Cosine Transform (IDCT) to reconstruct the 8×8 samples in the block.
JPEG employs the concept of a “minimum coded unit” (MCU), which refers to a group of one or more DCT blocks in lossy coding and samples in lossless coding. In JPEG, entropy coding is always performed on a complete MCU. MPEG has a similar concept and calls it a macroblock.
JPEG images contain byte-aligned “markers.” A marker is a unique code that can be located by scanning the compressed data stream in which it is embedded. In JPEG, markers are unique byte-aligned codes that can be used to identify the location and purpose of header and entropy-coded segments within an image. A marker comprises a marker code that identifies the function of the particular segment it precedes and a prefix, i.e., 0xFF. Most JPEG markers are followed by length fields and communicate header information. A few markers such as the Restart Markers indicate where the image data can be restarted and thus independently encoded or decoded. The Restart Markers include in their three least significant bits a modulo eight counter. This enables detection of any corruption of a previous Restart Marker as part of an error recovery mechanism.
The MPEG video frames often depend upon previous or future frames. At the start of independent frames (I-frames) the decoding is restartable. The MPEG syntax has unique 32-bit byte-aligned start codes. The sequence header, group-of-pictures header, picture header, and slice header all have their own start code. MPEG I-frames are also restartable at the slice headers.
An overview of audio coding with some details regarding MPEG audio is given in Section 10.3 entitled “Audio coding” of Rao et. al.
Various text compression algorithm are described in T. C. Bell, J. G. Cleary, and I. H. Witten, Text Compression, Prentice Hall PTR, Englewood Cliffs, N.J., (1990). Lemple-Ziv (LZ) compression techniques tend to split into two types LZ1 and LZ2. The LZ1 keeps a history of the last N bytes and tries to code the data as a pointer into the history buffer followed by the number of matching bytes. Otherwise, it signals that it is going to send the raw data. LZ2 type compression, e.g., LZW, constructs a dictionary of patterns already encountered in the data. After each repeat of a pattern, a new pattern consisting of the old pattern extended by one character is added to the dictionary.
Images need to be transferred to a print server in compressed form; and after manipulation by the print server, the print server often needs to recompress the manipulated image. This is particularly important if the printer controller has only hardware decoding and the bandwidth between the print server and the printer controller is inadequate.
While JPEG has a structure that supports independently decodable pieces of compressed data between Restart Markers, going all the way to the real domain is time consuming and wasteful. The markers are very helpful to enable parallel encoding and/or decoding starting at marker boundaries. It should be noted that the encoder can only place Restart Markers at fixed intervals in the source data and consequently the Restart Markers may not be optimally placed for the decoder. To change the placement of the Restart Markers would require at least a transcoding of the compressed data stream.
An example of arbitrary parallel decoding is given in S. T. Klien et al., S. T. Klien and Y. Wiseman, Parallel Huffinan Decoding, Proceedings: Data Compression Conference, pp. 383–392, Snowbird, Utah, (2000), (hereinafter “Klien et al.”). Klien et al. describe a parallel algorithm for decoding a Huffinan encoded image that exploits the tendency of Huffinan codes to resynchronize quickly. When more than one processor is available, the compressed data stream can be split into pieces, and each processor can be assigned one piece of the compressed data stream for decompression. Klien et al. suggest letting each processor overflow beyond its assigned piece into the next piece until its results synchronize with the processor that has been assigned to the next piece of compressed data. Once synchronization has been detected, the processor can stop or be assigned another piece. Synchronization is detected because each processor has saved the index of the last bit in each codeword. The processor of the previous piece can examine this list. Importantly, the assigned pieces do not necessarily begin at codeword boundaries. Therefore, the first codes are expected to be erroneously decoded until the Huffinan property of self-synchronization occurs. When applied to JPEG compressed images, the position and the DC predictors of the partially decoded image are guessed and subsequently corrected after synchronization has been established. Further, the approach suggested by Klien et al. has no additional information available as they try to reenter the compressed data stream at arbitrary boundaries. Consequently, correct display of the image must wait until all synchronization points have been established.
One very common image manipulation operation is rotation by a multiple of 90°. For example, images are often rotated to accommodate a particular page orientation, e.g., landscape or portrait, and/or to accommodate particular user-specified job attributes, such as impositioning. While current print servers typically support various forms of image manipulation, these manipulations may introduce multi-generation losses and make inefficient use of processing resources. As discussed further below, image rotation processes must typically accumulate the whole image in memory or on disk before the rotated image can be output. When rotating images, often the first pixels read by the rotation process contain some of the last pixels to be output.
Examples of prior art rotation of binary images in the real domain are: K. L. Anderson, F. C. Mintzer, G. Goertzel, and J. L. Mitchell, entitled “Method for Rotating a Binary Image”, U.S. Pat. No. 4,627,020 issued Dec. 2, 1986; and D. R. Pruett, G. Goertzel, and G. R. Tompson (sic), entitled “Method for Rotating a Binary Image,” U.S. Pat. No. 4,837,845 issued Jun. 6, 1989. A disadvantage of working in the real domain is that the entire image has to be decompressed, temporarily buffered, rotated, and then recompressed. This is costly in terms of both storage and time.
The above patents disclose storing the full image plus a much smaller temporary buffer. When the full image cannot be in contiguous storage, the method disclosed in K. L. Anderson and J. L. Mitchell, “System for Rotating Binary Images,” U.S. Pat. No. 4,658,430 issued Apr. 14, 1987 may be used. However, it still needs sufficient storage for the entire source image plus an additional buffer.
To avoid the time to decode all the way to the raster image, rotation on run end data is disclosed in K. L. Anderson, “Fast Algorithm for Rotating an Image in Run End Form,” IBM Technical Disclosure Bulletin, Vol. 32 no. 6B pp. 299–302 (1989). For typewritten text documents this run end data is significantly smaller than the source data. It has the disadvantage, however, that in the worst case there is a 16 to 1 expansion for alternating single pel runs since each run end is saved in 16 bits.
Rotation of continuous-tone images is less complicated because the pixels are generally on byte-boundaries, on the other hand they are likely to be 8 to 24 times larger. In addition, decoding to the real domain, and after rotation, reencoding is CPU intensive. An additional complexity is that lossy decoding and then reencoding has a multi-generation problem, namely the recompressed data doesn't match the previously compressed data.
The aforementioned approach is limited in that it converts between the transform domain and the real domain to prepare for the rotation process. W. B. Pennebaker, I. R. Finlay, J. L. Mitchell, K. L. Anderson, P. J. Sementilli, Jr. entitled “Intermediate Format for Representing Transform Data” Japanese patent JA02698034 issued Sep. 19, 1997 discloses lossless rotation of JPEG images in the transform domain. The entropy-decoded DCT transform coefficients are transposed within a block, some signs are changed, and the blocks must still be reordered. While CPU cycles have been drastically cut by avoiding the dequantization, IDCT, FDCT, and requantization, as above, the intermediate data could potentially expand (e.g., 2 to 1) and the buffering requirements are still large.