1. Field of the Invention
The present invention relates to an image processing apparatus and method that carry out replacement of a portion of image data easily by performing character recognition on character strings contained in character images of the image original.
2. Description of the Related Art
Regarding image data of an image original comprising a plurality of pages, there has hitherto been a demand to replace corresponding pages within the former image original with new image data resulting from revisions and corrections on image data of a portion of the pages within the former image. Furthermore, there has also been a demand to add new pages. As a conventional technique of a technical field similar to the present invention, a technique has been disclosed (see Japanese Patent Laid-Open No. H9-6948) in which a document for replacement is read in, a page number is identified, and replacements or additions of image data are carried out in page units with respect to pages of the document designated as replacement targets. Also disclosed in Japanese Patent Laid-Open No. H9-6948 is a technique in which a paper, on which information regarding the document for replacement is described, is read in, and a page number or the like of the document for replacement is identified based on this information such that replacements or additions of image data are carried out in page units with respect to the document targeted to be replaced.
Furthermore, a technique (see Japanese Patent Laid-Open No. 2000-148790) is disclosed in which image data of the image original targeted to be replaced and image data of the image original for replacement are binarized respectively and compared in pixel units, keywords are detected from the respective character regions to retrieve keywords, thereby retrieving similar images to replace pages.
Furthermore, a technique (see Japanese Patent Laid-Open No. 2002-82985) has been disclosed as one technique of technologies to retrieve similar images in which a degree of similarity is calculated by comparing a histogram of the image targeted for retrieval and a histogram of the original image.
However, in the above-mentioned conventional techniques, the page number of the document for replacement must be designated or the page number of the document for replacement must be specified by a user. On the other hand, there is a technique as disclosed in Japanese Patent Laid-Open No. 2000-148790 in which, in the case where the page number is not specified, the image data of the document of the desired page number is replaced by reading in a paper on which information is described regarding the document for replacement. However, time and effort is required for separately reading in the paper on which information is described regarding the document for replacement on which page numbers are described. Further still, in the case there is a cover sheet or the like on which there is no page number in the image original of the replacement target, it is necessary to structure such documents so that different page numbers are designated since the page structure is different from an image original in which all pages have page numbers.
Further still, in image data replacement for image originals based on keyword retrieval, there is a possibility that a replacement target will be incorrectly recognized if the keywords are the same even though it is a document that should not undergo replacement. Furthermore, conventionally it is necessary when replacing image data to either convert the replacement target image data and the image data for replacement to the same application data format and carry out a task of comparing their degrees of similarity in the same given application, or to temporarily convert the images into binarized images and compare their degrees of similarity. For this reason, in cases where the original image data to undergo replacement is not image data but rather a paper original, the original is first read by a scanner then converted to the same application data format as the replacement target image data, after which the degree of similarity is determined. In this case, there is a risk that dust or the like present on the scanned original or the scanning platen will adversely affect the histogram and that originals that are the same document will be determined to be different documents. Moreover, performing binarization and carrying out comparisons for each pixel is onerous and time consuming.