1. Field of the Invention
The present invention relates to a device, a program product and a system for image processing. In particular, the invention relates to a device, a program product and a system for executing a process to judge whether character images in input image data should be converted into character code data.
2. Description of the Related Art
Latest image recognition devices can recognize character images as character codes with extremely high accuracies as long as the documents are of good scanning conditions (for example if a document is made up of a single font type). However, if the quality of characters on the document is bad, or if the layout of characters or the like on the document is complicated, the accuracy of recognizing character images as character codes drops substantially and character image recognition errors occur more frequently.
In order to cope with such a problem, systems have been known wherein the entire image data obtained by scanning the document is stored as a back up. However, the volume of a file containing the backup image data naturally becomes substantially larger than the input image data.
Therefore, other systems have been proposed wherein only the character images that have high probability of recognition errors are outputted as character image data (e.g., bitmap type image data) without converting them into character code data. The probability of recognition errors is judged from information on certainty of recognizing character codes measured by checking similarity with the prerecorded standard character pattern, etc.
However, the information on certainty of recognizing character codes alone does not provide a sufficient accuracy for a judgment on which character images in the image data are accurately recognized as character codes and which character images are mistakenly recognized, so that it is difficult to remove mistakenly recognized character images completely.
The types of documents to be scanned have become quite diversified in recent years, many of them being color documents or documents with complex layouts. As a result, it has become increasingly difficult to extract from image data character areas where character images exist. Therefore, scanning of a document such as the one shown in FIG. 1, where non-character graphics are embedded in character areas, may result in outputting character code data as a result of mistakenly recognizing character codes from graphic images, i.e., non-character images in the obtained image data, as shown in FIG. 2. The symbol E1 in the drawing represents character code data obtained from character codes mistakenly recognized from graphic images, and the symbol E2 represents character code data obtained from character codes mistakenly recognized from character images. Moreover, there are cases where the certainty of recognizing character codes in converting graphic images to character code data E1 is not much different from the same for true character images. Therefore, it is impossible to remove the character code data obtained by mistakenly recognizing graphic images, if a judgment for converting into character code data is made only on the information of certainty of recognizing character codes.
Unexamined Publication No. JP-A-8-55185 proposed a technology of extracting character areas by means of checking positional relation with neighboring character images in the stage of extracting character areas where character images exist in the image data. U.S. Pat. No. 5,949,906 proposed a technology of first extracting character candidate areas which can be candidates for character areas by checking positional relation with neighboring character images, and then reconstruct character areas based on character images for which character codes can be recognized among image data in the character candidate areas. However, all of these technologies are nothing but the technologies to extract character areas from the image data.