Field of the Invention
The present disclosure generally relates to image processing and, more particularly, to an image processing apparatus, an image processing method, and a storage medium storing a program for determining a character within an image.
Description of the Related Art
In recent years, an increased number of colored documents are made available because of wide spreads of color printers and color scanners, for example. There are more chances to capture and store such documents as electronic files by scanning and transmit them to a third party over the Internet, for example. However, because direct storage of full-color data may impose a large load on an apparatus and lines, such data may need to be compressed to reduce its data amount.
In the past, methods for compressing a color image may include, for example, a method including converting a color image to a binary image having a pseudo-gray scale by using an error diffusion method or the like and compressing the binary image, a method including compressing in JPEG, and a method including converting a color image with 8-bit palette colors for ZIP compression or LZW compression.
According to Japanese Patent Laid-Open No. 2002-077633, a character region contained in an input image is detected, and the detected character region is converted to a binary image, is MMR compressed (binary non-reversible compressed) and is stored in a file along with character color information on characters therein. Furthermore, an image having the character region filled with a surrounding color on the input image is JPEG compressed (non-reversible compressed) by reducing its resolution and is stored in the file as a background image. The file compressed by this compression method may provide the character region in high quality and may contribute to a greater compression rate.
According to Japanese Patent Laid-Open No. 2002-077633, in order to detect a character region, whether each set of black pixels in a binary image acquired by binarizing an input image possibly corresponds to a character is determined based on the size (width or height) of the set of black pixels and whether sets of black pixels having an approximately equal size exist closely to each other.
On the other hand, application of such a method for performing the region determination based on a binary image as disclosed in Japanese Patent Laid-Open No. 2002-077633 to an input image in which a character and a background are difficult to be separated by simple binarization may result in difficult identification of pixels included in the character. For example, when simple binarization is performed on a black character over a white background (character image having a larger density difference between the character and the background), the background pixels and the character pixels may be separated easily. On the other hand, when binarization is performed on a black character over a dark background (character image having a small density difference between a character and a background), the separation between the background pixels and the character pixels is difficult. Particularly, performing binarization on a character over a high-density background with a threshold value lower than the density of the background may result in a binary character image with characters degraded in black. In this case, when the size of the high-density background region is approximately equal to the size of the character, the binary image in which the background and the character are degraded in black as a result of the binarization may be wrongly determined as a character pixel part. For example, when a document in which a part of a character string is marked with a thick marker pen is scanned and the scanned image is binarized, the entire part marked with a marker pen may sometimes turn black. When the size of the part marked with a marker pen is close to the character size, the whole pixels of the part marked with the marker pen may have a state degraded in black as a result of binarization and may thus be handled as one character. In other words, all black pixels in a region degraded in black as a result of binarization may possibly be handled as pixels of a character.