A character extraction technology is used in a character recognition system such as in an optical character reading apparatus for processing an image document. The character extraction technology is also used in an image editing system to delete a character from a graphic image. In the present invention, the term “character” includes alphabetic letters, Arabian numerals, Roman numerals, Kana characters, Kanji or Chinese characters, Arabian characters.
Japanese Laid-Open Patent Publication No. 08-123901 discloses a character extraction and recognition device. The device has a color image input device, a color space converting device, a color space dividing device, an image data to binary data converting device, a character extraction device, and a character recognition device. In the character extraction and recognition device, the input color image is divided into a plurality of color ranges, and characters are extracted using the divided color ranges. The character extraction and recognition device fails to disclose a method for simultaneously extracting plural color characters.
A related application U.S. Patent Application Serial Number) by the assignee of the current application has disclosed a system for simultaneously extracting characters in multiple colors from color image data. The color components such as Red, Green and Blue of the image data are simultaneously processed to generate bi-level color component data. As long as the character colors and the background color has a sharp contrast in color, the above described binarization is able to extract the characters.
Color documents and color visual mediums such as color printed matters, color photocopies and web pages in the Internet have become more widely used, and the use of colors has been extended. For example, web pages in the Internet are filled with characters in various colors in backgrounds in also various colors.
Referring to FIG. 1, a conventional method fails to extract certain color characters from a certain color background for character recognition. A sample image has three rows of characters including a row one having red characters in a black background. The second and third rows have dark characters in the white background. This sample image data is divided into color components such as red, green and blue, and then each of the color component data is binarized or processed into bi-level color component data. The bi-level color component data generally reveals sufficient contrast and are conducive to generating minimal circumscribing rectangles around character text. The results are merged back into single image data for character recognition where characters are recognized within the circumscribing rectangles. However, the bi-level color component data fails to reveal the characters in the first character row. Since the first character row contains red characters in black background, there is not sufficient contrast in the bi-level color component data.
Similarly, referring to FIG. 2, a conventional method also fails to extract certain color image from a certain color surrounding for image recognition. A sample image has three image portions including a mountain having red portion with a black adjacent surrounding. The second and third image portions have dark images in the white background. This sample image data is divided into color components such as red, green and blue, and then each of the color component data is binarized or processed into bi-level color component data. The bi-level color component data generally reveals sufficient contrast and are conducive to generating minimal circumscribing rectangles around an image portion. The results are merged back into single image data for image recognition where images are recognized within the circumscribing rectangles according to a predetermined method. However, the bi-level color component data fails to separate the mountain from the surrounding. Since the first image portion contains red mountain in black background, there is not sufficient contrast in the bi-level color component data.
Accordingly, a demand for extracting color characters from a colored background or a graphic image is increasing. In particular, the extraction of characters from a background whose color is similar to that of the characters is desired.