1. Field of the Invention
The present general inventive concept relates to an apparatus and method of binary image compression, and more particularly, to an apparatus and method of binary image compression in which a compression efficiency according to symbol-based encoding of an input binary image is calculated and a symbol-based encoding operation or a bit-based encoding operation is selectively performed based on the calculated compression efficiency to obtain an efficient binary image compression based on a characteristic of the input binary image.
2. Description of the Related Art
A binary image, such as a gray image, includes components such as a text that can be expressed by a symbol and a picture that cannot be expressed by a symbol. JBIG2 (Joint Bi-level Image Experts Group version 2) prescribed in ITU-T (International Telecommunication Union Telecom) Recommendation T88 compresses components of a binary image that can be expressed by a symbol using a symbol matching-based encoding operation and other components of the binary image that cannot be expressed by a symbol using context-based arithmetic encoding or halftone encoding operations.
Data compressed using different encoding methods is transmitted in units of a segment. In particular, components compressed using a symbol-based encoding operation are expressed by a symbol dictionary segment and a symbol region segment. In the symbol dictionary segment, a bitmap of symbols repetitively used in the binary image is compressed using modified modified read (MMR) encoding or arithmetic encoding operations, and information about each of the symbols (such as a width and a height) is compressed using Huffman encoding or arithmetic encoding operations. In the symbol region segment, a position of each of the symbols included in the binary image and an index of each of the symbols in a symbol dictionary are compressed using Huffman encoding or arithmetic encoding.
The construction of the symbol dictionary segment is as follows. A newly extracted symbol is matched to registered symbols registered in a symbol dictionary. If there is a matching symbol in the symbol dictionary, the newly extracted symbol is encoded using an index of the matching symbol. If the matching symbol does not exist in the symbol dictionary, the newly extracted symbol is added to the symbol dictionary and is encoded using an assigned index thereof.
The newly extracted symbol is sequentially matched to the registered symbols of the symbol dictionary to obtain matching scores to determine whether one of the registered symbols matches the newly extracted symbol. The matching of the newly extracted symbol to the registered symbols may be performed using, for example, a first-match method or a best-match method. The first-match method determines a first registered symbol having a matching score below a threshold value to be a matching symbol (i.e., a registered symbol that matches the newly extracted symbol). The best-match method determines a registered symbol having a best matching score among all of the registered symbols of the symbol dictionary to be a matching symbol (i.e., a registered symbol that matches the newly extracted symbol).
Since symbol-based encoding operations use a redundancy of symbols of a binary image, symbol-based encoding operations include many additional sub-operations, such as symbol extraction and symbol dictionary construction sub-operations, resulting in a high complexity of the compression. However, symbol-based encoding is widely used due to its high compression efficiency resulting from a high redundancy of the symbols in the binary image.
However, in the case of a binary image having a low redundancy of symbols, compression efficiency can be improved using bit-based encoding instead of symbol-based encoding. For example, when a number of extracted symbols extracted from an input binary image is 10 and a number of registered symbols registered in a symbol dictionary is 1, all symbols of the input binary image can be expressed using only a single registered symbol, resulting in a high compression efficiency. On the other hand, if the number of extracted symbols is 10 and the number of registered symbols registered in the symbol dictionary is 10, there is a low similarity between the extracted symbols, indicating that the registered symbols registered in the symbol dictionary are not often referred to, which degrades an efficiency of symbol-based compression.