When a document such as ledger sheet is read as a digital data by a scanner, the digital data may be saved as a monochrome image and a gray-scale image to decrease amount of data. However, recently, saving as a color image is required with increasing color documents due to spread of color printers.
The color image has larger amount of data compared with the monochrome image and the gray-scale image. Therefore, the color image may be compressed to be saved.
The JPEG (Joint Photographic Experts Group) format is well known as means to compress a color image at a high compression ratio. However, when a document image containing characters is compressed at a high compression ratio using the JPEG format, edge portions of the characters become blurred due to block noises so that visibility becomes poor.
A known technique to eliminate the problem is to subtract colors of an original image when the original image is compressed. When a document is scanned by the scanner, number of the colors may become huge due to quantization error and position shift. Therefore, color subtraction technique to decrease number of the colors is effective in order to decrease amount of data for compressing the color image. In the color subtraction technique, the number of colors to be used is determined by performing either a Hough transform or a main-component analysis on a frequency distribution in a color space. In addition, liner distributions of colors in the color space are acquired. The acquired distributions are classified into several clusters. Then, the several colors of the respective classified clusters are used to perform color subtraction.
Furthermore, there is a technique to extract only elements which are written by hand, from a ledger sheet in which form to fill is previously printed. When the ledger sheet is scanned by a scanner and scanned image data of the ledger sheet is transmitted using a communication mean, it is take a lot of cost to transmit whole scanned image data because the whole scanned image data has huge amount. In the case of the ledger sheet, a person may fill the form by hand-writing, the form being previously printed.
Since an image data (hereinafter, referred to as “preprint data”) printed previously is known, the elements written by hand is important. Therefore, high efficiency can be achieved by extracting only elements written by hand from a scanned image and transmitting the elements. A receiver can reconstruct an image data which is same as the scanned image data at a transmitter by synthesizing the received elements and the preprint data which is previously hold.
Images of edge portions of the characters, however, are more likely to have colors different from the color of the actually used ink, due to color shift that occurs at the time of scanning and the like. For example, the edge portions may have an intermediate color due to an influence of both the ink color and the background color. This case does not show a definite processing for the color deviating from the linear distribution, cannot handle the intermediate color appropriately.
A document image of ledger sheets or the like, sometimes, has a particular field intentionally dotted to be colored with an intermediate color. In addition, in some cases, some characters are printed over the halftone dote using an ink of the same color as that of the halftone dots. In this case, the following problem may be caused. If the color substitution processing of the document image is performed, the characters and the halftone dots may be recognized as having the same color so that the characters may be difficult to read. Furthermore, if letters, ruled lines, and other images are printed in a ledger sheet by using similar colors, an image of the ledger sheet has difficulty in classifying. For example, when an image is read from a document, on which characters or ruled lines of a red color are printed previously and on which a seal impression of a vermilion color is below added, it is difficult to classify the red color of the characters or ruled lines and the vermilion color into different color clusters.