1. Field of the Invention
The present invention is related to an image processing apparatus, an image processing system, and an image processing method, and in particular to, the image processing apparatus, the image processing system, and the image processing method, for conducting character recognition with respect to image data.
2. Description of the Related Art
Recently, according to a law revision, with respect to documents such as various types of financial and tax forms, minutes of board meeting, and the like, which are required by a business law and a tax law to be stored, it has been allowed to store them by using document files digitizing the documents as well as paper documents. Accordingly, the paper documents have been vigorously scanned and digitized more than before.
Capability of searching for a scanned document image becomes a problem. For example, in order to avoid a duplication of the same file name, there are many cases in which a filename is given to the scanned document image based on a date and a time when the a document is scanned. For example, “20090630141527.jpg” or the like may be applied as the file name. In a case of applying the file name in this manner, since information applied as the file name seems to be a symbol for a user, if the user attempts to find a document image previously scanned, the user is required to search for a desired document image by opening image files one by one. It can be thought to search for the desired document image from roughly outlined images using a reduced image which is a so-called thumbnail. However, it is difficult to find the desired document image from reduced images in a database if the database stores a large amount of forms being in the same format.
Thus, a technology using a so-called OCR (Optical Character Recognition) is proposed in that a character string included in a scanned image is recognized, and a unique file name is applied by using a recognized character string. In general, a file name attempted to be applied in the scanned document image is mostly included in the scanned document image. For example, a title of a document corresponds to the file name attempted to be applied.
Japanese Laid-open Patent No. 9-134406 discloses that a plurality of areas where character strings exist are detected from an image and a likelihood of the character string being a title is calculated based on characteristics of each detected area, and the OCR is performed in an area having the highest likelihood of being the title, so that a preferred file name is acquired from the image. By using this technology, the file name of the scanned document can be used as the title of the document. As a result, it is possible for a user to search for a desired document image file by simply looking at file names. Accordingly, it is possible to greatly reduce a search time of opening document image files one by one.
On the other hand, even if the technology disclosed in the Japanese Laid-open Patent No. 9-134406 is utilized, in a case in which multiple forms of the same format are scanned, plural duplicate file names are created. As a result, capability of search is degraded. Thus, Japanese Laid-open Patent Application No. 2009-27648 discloses a technology to extract multiple character strings from image data and create a file name by chaining the multiple character strings. Moreover, Japanese Laid-open Patent Application No. 2008-77454 discloses a technology in which a title is extracted and recognized, and neighboring character strings are further retrieved if a recognition result matches a predetermined title (for example, a frequently appearing word such as a word “form” or the like). Instead of simply applying a “form” as a file name, this technology realizes applying a file name such as a “form NO: 12457824N”.
However, the above-describe technologies cannot detect a title at high accuracy (accuracy near 100%). In addition, accuracy of the character recognition is not 100%. As a result, with such as an unidentifiable garbled character, an abnormal operation can occur more than a few times. When the abnormal operation occurs, since the above-described technologies are too complicated, most users, who do not understand an operation scheme, recognize the technologies as incomprehensible functions, that is, as useless functions. The above-described function applying a file name (hereinafter, called a “fully automatic file naming function”) has the above problems.
On the other hand, a technology is proposed to extract and recognize a title as a file name to apply in another approach. For example, Japanese Laid-open Patent Application No. 1-150974 discloses that a title area, where a predetermined marking is detected, is scanned, the OCR is performed on a scanned area, and a character recognition result of the scanned area is applied as a file name to a document image file. Japanese Laid-open Patent Application No. 9-274643 also discloses a similar technology. Hereinafter, these file naming functions are called semi-automatic naming functions.
Advantageously, the above-described semi-automatic naming functions can improve detection accuracy of a title. However, there are problems where an image is tainted due to the marking or a part of a character desired to be recognized is hidden, and accuracy of the character recognition is degraded.