1. Technical Field
The invention is related to the encoding and subsequent decoding of bi-level images, and more particularly to a system and process for encoding and decoding bi-level images that uses two context-based adaptive modules: 1) an adaptive predictor controlled by low-resolution probability estimates that is used to map the original pixels explicitly into prediction error pixels, and 2) a backward-adaptive Run-Length-Rice (RLR) coder that encodes the prediction error pixels.
2. Background Art
Bi-level images are quite common in digital document processing, because they offer the potential for a compact representation of black-and-white documents containing texts and drawings. In such images, their picture elements (pixels) can be seen as coming from a binary source (e.g., white=xe2x80x9c0xe2x80x9d and black=xe2x80x9c1xe2x80x9d). Since they usually contain a lot of white space and repeated ink patterns, one basic approach to efficiently encode such images is to scan them in raster order, e.g., from top to bottom and left to right, and encode each pixel via adaptive arithmetic coding (AC), whose state (or probability table) is controlled by a context formed by the values of the pixels in a small template enclosing previously encoded pixels [1]. That idea is the basis of most modern bi-level image compression systems.
Facsimile images are usually transmitted using the old CCITT standards T.4 and T.6, which are usually referred to as Group 3 and Group 4 respectively. G3 usually encodes images with a modified Huffman (MH) code (i.e., Huffman coding on runs of black or white pixels), and G4 uses MMR coding. MH and MMR are not as efficient as context-adaptive AC, but are simpler to implement. Over time, G3 and G4 evolved to include encoding via JBIG (also known as recommendation T.82). JBIG uses the context-adaptive AC, with adaptive templates and the efficient QM binary arithmetic encoder [2]. The JBIG-2 standard extends JBIG by including pattern matching for text and halftone data, as well as soft pattern matching (SPM) [3] for lossy encoding. The JB2 encoder [4] is also based on SPM, but uses the Z-coder for binary encoding. JBIG, JBIG-2 and JB2 can provide a significant improvement in compression performance over G4.
Although arithmetic coding is usually the choice when high compression performance is desired, comparable performance can be achieved by appropriate refinements to run-length (RL) coders. The Z-coder and the adaptive TRL coder are examples of efficient RL variants.
It is noted that in this background section and in the remainder of the specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
The present invention is directed at a new bi-level image encoding and decoding system and process that does not use arithmetic coding, but whose performance is close to that of state-of-the-art coders such as JBIG, JBIG-2, and JB2. In general, the present bi-level coder (BLC) uses two context-based adaptive modules: 1) an adaptive predictor controlled by low-resolution probability estimates that is used to map the original pixels explicitly into prediction error pixels, and 2) a backward-adaptive Run-Length-Rice (RLR) coder that encodes the prediction error pixels. That""s contrary to the usual approach where the context-dependent probability estimate controls both pixel prediction and adaptive entropy coding. Due to its simplicity, in many applications BLC may be a better choice other current coders.
The bi-level image compression encoding begins with a pixel prediction and prediction error generation procedure. Pixel prediction generally entails predicting the value of a pixel (e.g., either 0 or 1) based on surrounding pixels. More particularly, pixel prediction is accomplished by computing context-dependent probability estimates. A context is essentially a neighborhood of previously encoded pixels forming a pattern referred to as a template. Any standard template can be adopted for the purposes of the present invention. The context can be viewed as vector list of a prescribed number of pixel values in raster order. These values form a binary word that uniquely identifies the context. This binary word is referred to as a context index.
The context-dependent probability estimates are computed by first creating and initializing a pixel probability table. This is accomplished by assigning an initial probability to each of the possible context indexes. Preferably, the initial probability would be 0.5 (i.e., an equal possibility that the pixel associated with the context index is black or white). The probability value is however scaled to prevent any round-off problems between the encoder and decoder. Preferably, the scaling is done by choosing an integer number representing a probability of 100% that a pixel is white. For example, in tested embodiments of the present invention the number xe2x80x9c8xe2x80x9d was employed. Thus, the scaled probability representing the aforementioned initial value would be xe2x80x9c4xe2x80x9d.
For each pixel in raster order, the context index associated with the pattern of previously encoded pixels is identified and the scaled probability read from the table. If the probability is 0.5 or above (i.e., a scaled probability of 4 or above in the example given above), then the pixel under consideration is predicted to be white and assigned the appropriate binary value (e.g., preferably a xe2x80x9c0xe2x80x9d pixel value). Note that the first time each context index is encountered, the prediction will always be a white pixel since a scaled probability of 4 was initially assigned to each context index in the table. The scaled prediction value is then adjusted by increasing it by a prescribed amount (e.g., by one) if the pixel just predicted was deemed to be white, or decreasing it by a prescribed amount (e.g., by one) if the pixel was predicted to be black. The results of the scaled probability adjustment operation are truncated to a zero value if it falls below zero, and to the maximum scaled probability value minus one if it falls above that value. Thus, the probabilities will vary depending on the image being encoded and what pixel location is being predicted. This is referred to as backward adaptive pixel prediction, since the decoder can perform the same adjustments to the probability estimates without the need for explicit context probability information to be sent to the decoder.
The prediction error is computed next. Essentially, the prediction error is computed by comparing the predicted pixel value of either black or white for each pixel in the bi-level image to the actual pixel. Then, only data concerning those predictions that are incorrect need be transmitted. In most cases, the prediction will be correct, so a considerable savings in the amount of data can be realized. This works because as will be described later, the decoder performs the same prediction process and will get the same results including the errors. Thus, all the decoder needs to know is which of the predicted pixel values are in error so they can be changed from white to black or black to white, as the case may be, to reconstruct the image. The prediction error is specifically computed using a binary technique such that the actual value of each pixel in the image is compared to its predicted value using exclusive OR logic. Thus, if the actual pixel value matches the predicted value (e.g., both are 0""s or both are 1""s), then a xe2x80x9c0xe2x80x9d is assigned to that pixel location as part of a so-called prediction error image. However, if the actual pixel value is different from the predicted value, then a xe2x80x9c1xe2x80x9d is assigned to the associated pixel location in the prediction error image.
The next phase of the bi-level image encoding involves the use of a context-dependent, backward-adaptive, Run-Length-Rice (RLR) coding procedure. As it has been found that the predicted value will usually match the actual value, the prediction error image is composed mostly of 0""s. This makes the prediction error image particularly amenable to further compression, thus allowing even less information to be transmitted. To encode the prediction error image, it is preferred the RLR encoding technique be used. In general a RLR coder is a variable-to-variable length entropy coder in which uninterrupted runs of 2k zeros are represented by a codeword formed by a single xe2x80x9c0xe2x80x9d, and partial runs of r zeros (r less than 2k) followed by a 1 are represented by a codeword formed by a 1 followed by the k-bit binary word representation of r. The variable k defines the maximum run length of zeros that can occur in the prediction error image before a codeword is transmitted. Adjusting this variable controls the efficiency of the coding operation. The preferred technique is to employ a backwards-adaptive approach for adjusting k. This approach involves choosing an initial value for k and then adjusting it up or down in increments based on whether a xe2x80x9c0xe2x80x9d codeword is generated or a xe2x80x9c1+k-bit binary wordxe2x80x9d code is generated. The RLR encoding technique according to the present invention is also made dependent on the previously described contexts. Specifically, an encoding table is established which assigns a k variable to each context index. The encoding table is updated to reflect the changes to the k values that may occur during the encoding of the bi-level image, as will be explained next.
The aforementioned context-dependent, backward adaptive, RLR encoding technique involves first initializing the aforementioned encoding table by setting the k value associated with each context index to a prescribed initial value (e.g., k=2). In addition, a scaled version of the k variable designated as the Rice parameter kxe2x80x2 is assigned to each context. For example, a simple scaling factor could be multiplied by the current k value to produce the current kxe2x80x2 value, which would be greater than the k value.
When a prediction error value is established for a pixel location, the present RLR coder identifies the context index associated with that pixel location as determined in the prediction error determination process described earlier. The k value currently assigned to that context index is then read from the encoding table. In the case where the pixel location under consideration is the first pixel in raster order in the image (i.e., the upper left hand corner pixel), the associated k value read from the table is used to calculate the run length, where the run length is preferably equal to 2k. This run length represents the number of consecutive white pixels in raster order that must exist in order to generate a xe2x80x9c0xe2x80x9d codeword. When the next prediction error value is computed, it is determined whether it is a xe2x80x9c1xe2x80x9d or a xe2x80x9c0xe2x80x9d. If it is a xe2x80x9c0xe2x80x9d, then it is determined if this value is in the xe2x80x9cmiddlexe2x80x9d of the previously computed run length under consideration or whether it represents the end of this run length. If it does not represent the end of a run, no codeword is generated. If, however, the prediction error value does represent the end of a run, then a xe2x80x9c0xe2x80x9d codeword is transmitted. Of course in order to know whether an prediction error value represents the end of the current run length, the present RLR coder must keep track of how many xe2x80x9c0xe2x80x9d have been encountered. This is preferably done by also including run counters in the encoding table. Specifically, a separate run counter would be assigned to each context index. In one embodiment the run counter would initially be set to the computed run length value. Then, each time a xe2x80x9c0xe2x80x9d is encountered as discussed above, including the first in the series, the counter is decremented by one. When the counter reaches zero, the prediction error value currently being process is deemed to be the end of the current run length. If, on the other hand, a prediction error value of xe2x80x9c1xe2x80x9d is encounter at any time during a run, then the present RLR coder generates a xe2x80x9c1+k-bit binary wordxe2x80x9d code where the k-bit binary word represents the number of xe2x80x9c0""sxe2x80x9d encountered in the current run prior to encountering the xe2x80x9c1xe2x80x9d. The number of xe2x80x9c0""sxe2x80x9d encountered can be easily determined using the aforementioned run counter assigned to the context index associated with the pixel location where the run began. Once a codeword has been generated, whether it is a xe2x80x9c0xe2x80x9d or a xe2x80x9c1+k-bit binary wordxe2x80x9d, the very next prediction error value that is generated is used to start another run. This is accomplished as it was for the first pixel location by identifying the context index associated with the pixel location of the prediction error value and repeating the foregoing process.
In addition, every time a codeword is generated, the k-value associated with the run that resulted in the codeword is adjusted. This is preferably accomplished as follows. If the codeword generated was a xe2x80x9c0xe2x80x9d, then the parameter kxe2x80x2 is increased by a prescribed amount. Conversely, if the codeword was not a xe2x80x9c0xe2x80x9d, then the parameter kxe2x80x2 is decreased by a prescribed amount. This prescribed amount can vary, if desired, depending on the current value of kxe2x80x2. The new k value is computed by dividing the new kxe2x80x2 value by the aforementioned scaling factor. The new value for kxe2x80x2 is then stored in the encoding table in place of the previous value. By adjusting kxe2x80x2 by integer steps, it is possible to achieve a fine adjustment of the RLR parameter k, which is necessary for optimal encoding performance, while keeping only integer arithmetic, which is necessary to allow the decoder to precisely track the k adjustment steps.
The process for decoding a bi-level image encoded as described above is for the most part just the reverse of the coding process. Specifically, the decoding process first involves receiving the bitstream generated by the encoder and processing it using what will be referred to as a context-dependent, backward-adaptive, Run-Length-Rice (RLR) decoding technique. The present RLR decoder processes each codeword in the incoming bitstream in the order of its arrival. Essentially, if the received codeword is of the xe2x80x9c1+k-bit binary wordxe2x80x9d form, the decoder assigns the designated number of 0""s to each consecutive pixel location of a prediction error xe2x80x9cimagexe2x80x9d, in raster order, beginning with the first pixel location (e.g., the upper left hand corner pixel location) if the codeword is the first codeword received, or beginning just after the last assigned location for any successive codeword. It then assigns a xe2x80x9c1xe2x80x9d to the next consecutive pixel location. However, if a xe2x80x9c0xe2x80x9d codeword is received, the coder preferably assigns a 2k number of 0""s, each to respective consecutive pixel locations of the image, in raster order, beginning with the first pixel location if the codeword is the first codeword received or beginning just after the last assigned location for any successive codeword. Of course, while the number of 0s designated by the binary word is straight forward, the decoder must know what the value of k in order to designate the correct number of 0s when a xe2x80x9c0xe2x80x9d codeword is received. Essentially, to accomplish this task it is first recognized that the prediction error value of the first pixel location will always be a 0 as a result of the encoding process. Thus, the decoder assigns a 0 to that location and sends the value to an integrator. At the same time, the decoder starts the same pixel prediction process as was used by the encoder. Specifically, as the predicted value of the first pixel location will always be white (e.g., a 0), the decoder predicts this pixel to be a 0 and send it to the integrator as well. The integrator is simply an exclusive OR process. Thus, if the retrieved prediction error value is a 0 (i.e., no error) and the predicted pixel value generated by the decoder is a 0, then the result is a 0 which is designated as the actual pixel value and assigned to the pixel location under consideration. Likewise if the predicted error value is a 0 and the predicted pixel value is a 1, a 1 is generated and assigned to the pixel location. If, however, the recovered prediction error is a 1, the predicted pixel value generated by the decoder is flipped such that a 0 is changed to a 1 and 1 is changed to a 0. These flipped values are then designated as the actual pixel value of the pixel location under consideration.
Once the first prediction error value and first predicted pixel value have been processed (which will always produce a 0 or white pixel in the recovered bi-level image), the process continues as follows. In association with generating the first predicted pixel value, the decoder also identifies the context associated with the pixel location. In the case of the first pixel location this context will always be all 0""s and so the context index would also be a 0. The RLR decoder constructs a decoding table that matches the previously discussed encoding table. Namely, an initial k value (if included) is assigned to each possible context indexes, as is a kxe2x80x2 value and a run counter value. These initial values are by design the same as those used to construct the encoding table. The decoder takes the context index provided to it as a result of the pixel prediction process and uses this to identify the appropriate k value. Thus, if the codeword under consideration is a 0, the decoder knows the run length. As such the decoder simply assigns 0""s to the appropriate number of consecutive pixel locations that form the prediction error image. In the case of the first pixel location, if the first code word is a 0, a number of consecutive pixel locations based on the initial value of k and starting with the first location would be assigned a 0. In addition, the k value associated with the context index of the pixel location at the beginning of the run length is adjusted just as it was in the encoding process. Thus, the decoding table will always match the encoding table as it existed when the codeword currently being processed was generated. As the prediction error values are generated in the above manner, they are fed to the integrator, which is also receiving the predicted pixel values for the associated pixel locations that continue to be generated using the same method as was used to generated them in the encoding process. This is possible because each consecutive actual pixel value reconstructed is made available to the process, and the process uses the same pixel probability table scheme as the encoder.
Once the first incoming codeword has been processed as above, the decoder retrieves the next received codeword and processes it in the same way, except that the context employed is that associated with the next consecutive un-reconstructed pixel location in the bi-level image being generated (which corresponds to the next un-assigned pixel in the so-called prediction error image being generated by the decoder). This is repeated over and over until the entire image has been reconstructed.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.