1. Field of the Invention
The present invention relates to an apparatus which processes an image including a character string, and a method which is used by the relevant apparatus.
2. Description of the Related Art
There has been known a technique of reusing data described on a paper document by scanning the document, generating a document image from the scanned document, and then performing various processes to the generated document image. For example, there has been known a technique of performing a character recognition process to the document image. On the other hand, there is information which is lost in the document when the document image is generated from the document. For example, there is a possibility that font information, font size information, character stuffing information and the like which act as character layout information in the document are lost when the document image is generated from the document. Under the circumstances, it is conceivable that it is possible to make the process for the document image more efficient if such lost information can be used again. For this reason, there has been known a technique of trying to solve such a problem by previously defining information which can be substituted for the lost information. For example, in Japanese Patent Application Laid-Open H04-188288, a character spacing (or a character pitch) is previously defined as a rule of a character layout, and then the defined character spacing is compared with a character spacing actually obtained as a result of character recognition, thereby improving recognition accuracy.
However, in such a technique as described above, judgment can be performed only based on the previously set rule. In other words, there is a problem that it is impossible in the above technique to perform correction to a document which has been created by using an unset rule. This problem occurs because the layout information of the document is unclear. That is, since the layout information is unclear in the related art, it takes action by previously storing or holding the layout information. However, in such a case, it is difficult to previously store or hold the layout information in regard to a document in which various layouts exist.