1. Field of the Invention
The present invention relates to an apparatus and method of managing a dictionary composed of different symbols in a text image coding and decoding system, and more particularly, to an apparatus and method of dynamically caching symbols to manage a dictionary memory in a pattern matching based coding and decoding system.
2. Description of the Related Art
Generally, the JBIG2 standard, described in ITU-T T.88, defines a compression method for bi-level images, that is, images consisting of a single rectangular bit plane, with each pixel taking on one of just two possible colors. The JBIG2 standard only defines requirements for decoding a compliant bit stream corresponding to the images, and does not define a specific feature of an encoder design, but is flexible to many optimizations that can improve quality, compression, or speed of the encoder design.
In addition, the JBIG2 standard is the first international standard that provides for lossy, lossless, and lossy-to-lossless compression of the bi-level images, and supports multi-page images and model-based coding for text and halftones. The JBIG2 standard also permits compression ratios that are three to eight times better than previous standards, such as G3, G4, and JBIG1.
A bi-level document may contain one or more pages, and each page may contain text data, halftone data, and other data, such as line art or noise, as an input bi-level image. The JBIG2 encoder is expected to segment the input bi-level image into different regions, usually three regions, and to code each region separately using a different coding method. FIG. 1 is a view illustrating a composite image as an input image to be decomposed into three regions in the JBIG2 standard, such as a text region, a halftone region, and a generic region.
The JBIG2 standard includes encoding the text region of the input image using a symbol dictionary. The input image is split into horizontal stripes. Each stripe is scanned in raster order to extract connected components (symbols). Each symbol extracted is compared with reference symbols in the symbol dictionary. If a match is found between the symbol of the input page and the reference symbols of the symbol dictionary, the symbol of the input image is encoded using the following information: its location offset relative to its preceding symbols, a dictionary index pointing to its best match with the reference symbol in the symbol dictionary, and a refinement of an encoded bitmap of the symbol of the input image. If the match is not found, a new symbol corresponding to the unmatched symbol of the input image is added to the symbol dictionary. This pattern matching based coding system is known as Pattern Matching and Substitution (PM&S).
There exist several methods for the design of the symbol dictionary. The simplest one is a method that creates a completely different dictionary for each stripe encoded, without taking into account the symbols that were used to encode a previous stripe. This method is called an independent Dictionary. This method provides a poor compression ratio since a new dictionary must be encoded at each stripe. It is important to note that the symbols may be repeated from one stripe to another. Therefore, repeating symbols across different stripes has two disadvantages. First, these additional symbols increase computation due to the time taken to encode them. Second, retransmission of these redundant symbols of the independent dictionaries increases the overall bit rate of the encoding system.
The second one is a Global Dictionary method to take advantage of the fact that the symbols are repeated. This method uses a single dictionary to encode an entire multi-page document. Therefore, the Global Dictionary contains all the symbols necessary to encode all the stripes in the document. This method produces a high compression ratio. However, it is not feasible when there are memory limitations on the encoder or decoder. Moreover, the Global Dictionary method also increases the computation time because the symbol matching process requires a linear search through a much larger dictionary.
The third one is a Local Dictionary method which has been proposed by Ye and Cosman in the PhD dissertation “Text Image Compression Based on Pattern Matching”, Yan Ye, 2002. The Local Dictionary method takes advantage of the fact that symbols may be repeated within consecutive stripes, and works as follows. At each stripe, the symbols from a previous dictionary that will not be used to encode a current stripe are removed, and the symbols that appear in the current stripe but did not find a match in the previous dictionary are added. The Local Dictionary method has disadvantages in that some symbols that are already stored in the dictionary and may be used in processing the next stripe are discarded. Typically, a compression ratio for the local dictionary method is higher than the Independent Dictionary method but much lower than the Global Dictionary method