(1) Field of the Invention
The present invention relates to a program and apparatus for forms processing, and more particularly to a program and apparatus for forms processing, which extract prescribed keywords from a scanned form image.
(2) Description of the Related Art
In related art, there are two approaches for form input operation which converts paper documents into electronic form: one is structured-form input and the other is non-structured-form input.
The structured-form input operation is employed when the types of forms to be entered are known. The layouts of forms to be entered, such as the positions of keywords and so on, are previously defined. The form of a scanned form image is identified, and keywords are automatically extracted based on the defined layout corresponding to the form. However, this structured-form input operation has a drawback that this method cannot be employed for a case where forms are of unknown types. For processing forms of unknown types, layout definitions should be manually made in advance for each form to be processed, which costs a lot.
On the other hand, the non-structured-form input operation is employed when the types of forms to be entered is unknown. In this method, as the layout definitions cannot be made, the input operation should be manually done. Therefore, a manual input cost is very high.
As described above, both of the structured-form input and non-structured-form input operations have drawbacks. In order to streamline the form input operation, a technique of automatically extracting keywords from non-structured forms is demanded.
There is a proposed form processing apparatus which identifies the image of a form, searches for and extracts a readout region based on preset keywords, and recognizes and obtains data within the region (for example, refer to Japanese Patent Laid-open Publication No. 11-238165, paragraphs [0009] to [0012] and FIG. 3).
In addition, to enhance the accuracy in keyword extraction, there is a proposed image processing method which extracts a tentative cell region according to a format such as ruled lines from a document image, recognizes characters within the cell of the image, searches for strings corresponding to specified keywords from the recognition result, and specifies a cell region from the detected strings (for example, refer to Japanese Patent Laid-open Publication No. 2001-312691, paragraphs [0013] to [0018], and FIG. 2).
In related art, in order to automatically extract keywords from non-structured forms, a readout region of a form image is determined through layout recognition, strings are recognized within the determined readout region through character recognition, and strings corresponding to keywords are detected from the recognized strings through word matching. However, it is not easy to perform the layout recognition and the character recognition on non-structured form images which do not have layout definitions, and there is always a possibility of failure. Further, the forms processing in related art performs matching on strings extracted through the layout recognition and the character recognition, which causes a problem that keywords cannot be extracted if the recognition processes are not done accurately.
One example will be described. FIGS. 19A and 19B are views showing a case where keywords cannot be accurately extracted due to failure in layout recognition. FIG. 19A is the image of a form and FIG. 19B shows text blocks recognized by performing the layout recognition on the form image of FIG. 19A.
In this example of these figures, a form image 901 produced by a scanner has a noise 902 due to dirt or the like of the form. When the layout recognition is performed on this form image 901, the “ESTIMATE (PRICE)” and “ESTIMATE (PRODUCT)” are recognized as falling into one block due to the noise 902 therebetween, with the result that a text block 903 including the noise is erroneously extracted. Therefore, “ESTIMATE” and “PRICE”, and “ESTIMATE” and “PRODUCT” are separated. If the character recognition is performed on these recognized text blocks, a text block “ESTIMATE . . . ESTIMATE” 903, a text block “PRICE” 904, a text block “PRODUCT” 905, a text block “¥120,000” 906, and a text block “PERSONAL COMPUTER” 907 are recognized as strings. Therefore, even if keywords for the matching search include “ESTIMATE PRICE” and “ESTIMATE PRODUCT”, these strings cannot be detected from the character recognition result, and thus keywords cannot be extracted.
As described above, under the situation where the layout recognition is failed, even if characters are correctly recognized, the arrangement of the characters is not correctly recognized, and thus keywords cannot be extracted. This is a problem. In addition, the same problem occurs when the layout recognition is correctly done but the character recognition is failed.
In addition, a keyword is represented by two types of elements: one is an item and the other is data. The forms processing of related art has another drawback that appropriate linking between items and data cannot be performed.
FIGS. 20A and 20B are views showing a case where it is difficult to link an item and data. FIG. 20A shows a case where two items can be linked to one piece of data. FIG. 20B shows a case where two pieces of data can be linked to one item.
Referring to FIG. 20A, the layout recognition process and the character recognition process are performed on a form image 910, and items, “price” 911 and “TOTAL” 915, and data, “¥40,000” 912, “¥42,000” 913, and “¥82,000” 914, are obtained. Based on the positional relationships of their text blocks, an item and data which have an almost same vertical or horizontal coordinate, that is, an item and data which can be regarded as being arranged in the vertical direction or horizontal direction are linked to each other. In this example of this figure, “¥40,000” 912 and “¥42,000” 913 are linked to the “PRICE” 911 which is arranged with them in the vertical direction. However, “¥82,000” 914 can be linked to both “price” 911 which is arranged with it in the vertical direction and “TOTAL” 915” which is arranged with it in the horizontal direction. It cannot be determined from the positional relationships which should be linked to “¥82,000” 914.
On the other hand, referring to FIG. 20B, by performing the layout recognition process and the character recognition process on a form image 920, items, “ISSUE DATE” 921 and “ESTIMATE EXPIRY DATE” 923, and data, “DECEMBER 2, 2005” 922 and “DECEMBER 16, 2005” 924, are obtained. Based on the positional relationships between them, the “ESTIMATE EXPIRY DATE” 923” and “DECEMBER 16, 2005” 924 can be linked to each other. However, “ISSUE DATE” 921 can be linked to both “DECEMBER 2, 2005” 922 which is arranged with it in the horizontal direction and “DECEMBER 16, 2005” which is arranged with it in the vertical direction. It cannot be determined from the positional relationships which should be linked to “issue date” 921. Similarly, “DECEMBER 16, 2005” 924 can be linked to both “ISSUE DATE” 921 and “ESTIMATE EXPIRY DATE” 923.
As described above, there are cases where it is difficult to correctly link extracted keyword items and extracted keyword data. However, there is no method for selecting an appropriate link in related art.