This application claims benefit of priority to Japanese Patent Application No. 11-010969 filed Jan. 19, 1999, the entire disclosure of which is incorporated by reference herein.
1. Field of the Invention
The present invention relates to character recognition, in particular to a method, a computer readable medium and an apparatus for extracting characters from color image data.
2. Discussion of the Background
Character extraction technology provides preprocessing of an image document in a character recognition system, for example, in an optical character reading apparatus. The character extraction technology is also used in an image editing system, such as, character deletion within a graphic image, etc. In the present invention, the term xe2x80x9ccharacterxe2x80x9d includes alphabetic letters, Arabian numerals, Roman numerals, Kana characters, Kanji or Chinese characters, Arabian characters, etc.
As a character extraction method, Japanese Laid-Open Patent Publication No. 08123901 describes a character extraction and recognition device. The device has a color image input device, a color space converting device, a color space dividing device, an image data to binary data converting device, a character extraction device, and a character recognition device. In the character extraction and recognition device, the input color image data is divided into a plurality of color ranges, in which characters are extracted using divided color ranges. However, the character extraction and recognition device does not disclose a method for simultaneously extracting plural color characters.
Use of color documents and color visual mediums, such as, color printed maters, color photocopies, and print outs of Internet web pages, are increasing. For example, web pages in the Internet are filled with various types of characters in various colors on various types of backgrounds (e.g., colored, pattered, pictorial, graphic image background, etc.). Accordingly, a demand for extracting color characters on a white or colored background including a graphic image is increasing. A demand for extracting white or relatively light color characters on a relatively dark background is also increasing.
The present invention has been made in view of the above-discussed and other problems, and has as one objective to overcome the above-discussed and other problems with the background apparatuses and methods. Accordingly, one object of the present invention is to provide a novel method, computer program product and apparatus for extracting characters from color image data that can simultaneously extract characters having a plurality of colors.
Another object of the present invention is to provide a novel method, computer program product and apparatus for extracting characters from color image data that can extract a plurality of white or relatively light color characters on a relatively dark color background.
To achieve these and other objects, the present invention provides a novel method, computer program product and apparatus for extracting characters from color image data that include inputting color image data; separating the input color image data into a plurality of color component data; and converting each of the plurality of color component data into a plurality of bi-level color component data, respectively. Other functions include circumscribing rectangles around linked pixels having identical bi-level values in the plurality of bi-level color component data, respectively; selecting the circumscribed rectangles in the plurality of bi-level color component data, respectively, based on the sizes of the circumscribed rectangles; merging the bi-level color component data inside the selected circumscribed rectangles; and outputting the merged bi-level image data.