1) Field of the Invention
The present invention relates to a technology for discriminating a medium (e.g., a document, a ledger sheet) based on an image data obtained by reading the medium on which information are indicated, and particularly relates to a technology for recognizing content of the information indicated in the medium with high accuracy.
2) Description of the Related Art
As an apparatus performing data medium recognition or character recognition by reading a data medium (for example, a document, a ledger sheet), as image data, on which information such as characters, codes, numeric characters, pictures, ruled lines, barcodes and so forth are indicated, there have been developed in these years document recognition apparatuses such as an optical character reading apparatus [OCR (Optical Character Recognition/Reader) apparatus] and the like. Various kinds of industries make widely use of the document recognition apparatus to improve, for example, the efficiency of the business.
For example, an operator doing the window working in a financial organ or the like uses the document recognition apparatus to efficiently handle document media (hereinafter, referred to simply documents), thereby improving the efficiency of his/her work.
With respect to such document recognition apparatus, there is a technique of not only handling a large amount of the same kind of documents but also automatically handling documents in various formats in order to carry out the document handling more efficiently (refer to Patent Documents 1 and 2 below, for example).
In some cases, for the sake of efficiency improvement of document processing jobs, it is required to process collectively and automatically a plurality of document groups with different types. For example, as frequently seen after merger and abolition of financial facilities, when a plurality of document groups each having a different format of a different financial facility should be consolidated to one system or when document groups of a plurality of regional offices (branch offices) should be processed collectively by headquarter (main office) organization or the like (centralized processing), it is required that a plurality of document groups each different in their types should be processed together.
Meanwhile, with conventional technology used to date to process a plurality of document groups with different types highly efficiently and with higher accuracy, an identification document on which document group information for identifying type of the document group is recorded (indicated) content and number of sheets or the like is inserted at front line of each of the document groups, a medium recognition apparatus first identifies this identification document prior to processing each of documents in a document group, and having recognized type and number of sheets of a document group following this identification document, and then processing of these document group is carried out.
Specifically, for example, an identification document 100 as shown in FIG. 48 is disposed at front line of each of document group and reading of a document group is then executed. That is, a document ID (numeric character “1234” in this example) is recorded (added) to the identification document 100 to recognize identification document 100 itself and further, document group information such as type of subsequent document group (“P” in this example), number of sheets (“500 sheets” in this example) or the like are recorded.
Hence, having read this identification document 100 and document group by a scanner apparatus as an image data, the document identification apparatus first recognizes the document ID of the identification document 100 at the front line and discriminates the identification document 100.
In other words, the document identification apparatus discriminates what document group information is recorded where in the identification document 100 based on information, which shows a correspondence between a document ID, and place of recorded portion and recorded item of the document group information in the identification document, maintained in advance in a database or the like, and then recognizes content of such document group information.
Due to this, the document recognition apparatus can execute effectively recognition of content of document groups following to the identification document 100, and recognition processing can be executed effectively for a plurality of document groups each different in types.
Further, similar to the identification document 100, a document ID is recorded in each of documents in document group and when the document recognition apparatus recognizes each document, it discriminates what information is discribed in where of the document by recognizing this document ID first.
Due to this, the document recognition apparatus can perform recognition processing effectively for each of documents.
By the way, with conventional document recognition apparatus mentioned above, processing for recognizing a document ID in an identification document and processing for recognizing a document ID in each of documents constituting a document group are very important.
Therefore, these document IDs should be recognized with high-accuracy.
However, a document recognition apparatus is not necessarily capable of recognizing characters with 100% recognition rate and there is a limitation in accuracy for character recognition, and there is a possibility that a document ID is recognized erroneously and moreover, characters constituting a document ID are rejected (that is to say, one character can not be recognized as one character) or in the worst case, a document ID is not recognized at all.
When a document ID is not recognized correctly as is the case shown above, correction processing is required after automatic document processing (recognition processing) by a document recognition apparatus is once interrupted, and the document, the document ID of which was not recognized correctly, should be read again by a scanner apparatus or an operator inputs the document ID of the document.
When processing is once interrupted due to correction processing as mentioned above while a plurality of document groups each different in types is being recognized automatically by the document recognition apparatus, a great delay is caused in processing.
Therefore, it is desired that a document ID should be recognized with high-accuracy to allow discrimination of documents with high-accuracy.
Incidentally, in order to realize higher accuracy recognition processing, one idea emerged is to improve resolution of a scanner apparatus for reading a document as an image data. However, if resolution of the scanner apparatus is improved, processing speed is reduced on the contrary or character recognition accuracy is reduced though slightly. This tendency is remarkable especially with high-speed scanners compared with medium-speed machines.
[Patent Document 1] International publication No. WO97/05561
[Patent Document 2] Japanese Patent Laid-Open (Kokai) No. 2003-168075