Some conventional scanner apparatuses apply OCR processing to original image data obtained by scanning, which result is embedded in original image data as transparent character data, and an electronic document is thereby obtained in order to improve convenience, retrieval performance in particular, for a scan result. By performing such processing, transparent character data (text data) is embedded for original image data scanned as an image, and the transparent character data enables text retrieval.
By using such an electronic document in which the transparent character data is embedded, when a user views the electronic document by displaying or printing, only the scanned original image data is viewed, therefore the user is not annoyed, and since the transparent character data subjected to an OCR conversion is embedded in the original image data, retrieval processing is possible by using the character data.
Related to a technology according to character recognition processing by OCR processing of image data, for example, in a technology disclosed in Japanese Laid-Open Patent Publication No. 10-232904, a predetermined format is printed on a ledger sheet, and a character to be an object of character recognition is written, then a stop mark is written at the end of a range required for recognition processing when performing character recognition. In a character recognition apparatus, unnecessary processing is not performed for a blank part by performing character recognition processing in a range up to a position at which the stop mark is displayed.
As described above, processing for embedding of transparent character data for original image data is convenient since retrieval performance for the original image data is improved. However, on the other hand, since the transparent character data is embedded in the original image data before the user is aware thereof, there is also an aspect of possibly becoming a cause of information leakage unintended by the user. For example, when the user performs masking processing on a part of original image data in order to protect information for an electronic document comprised of original image data in which transparent character data is embedded, description contents of the masked part can be leaked from the embedded transparent character data in a case where deletion of transparent character data is not performed at the time of masking.
Additionally, since the transparent character data is utilized mainly for the purpose of a keyword search, it easily becomes an object to be searched by a keyword search system. Accordingly, even though searching is impossible in the case of mere original image data, easy search becomes possible with the transparent character data when a malicious searcher attempts to obtain confidential information such as personal information.
In Japanese Laid-Open Patent Publication No. 10-232904, although unnecessary character recognition processing is attempted to be suppressed by writing a stop mark in an original, nothing is mentioned on a problem of information leakage due to transparent character data embedded in original image data as described above.