In a process for reading a business form whose character boxes are filled out with necessary matters using a scanner (optical character reader) and for recognizing characters on the read image, a recognized image consisting of only written characters to be recognized is obtained with colors excluding those of written characters on the business form dropped out when business forms are read with a scanner, for the purpose of increasing accuracy in character recognition of read image.
More specifically, business forms are prepared whose character boxes and necessary matters are printed in a dropout color excluding that of written characters using black and the characters on the business forms are recognized in the following steps. (Step 1) A filled-out business form to be recognized is read in a black background by a scanner. At this point the color of a light source is matched to the dropout color of the business form to dropout parts unnecessary for recognition such as fields. (Step 2) A recognized area is defined by finding edges of the business form in the character box image and the black background of recognized business form to recognize characters within the area.
However, in checking a recognized image obtained by dropping out the fields on the business form and a image of recognized results while both images are arranged side by side on a screen, it has been hard to identify which written contents correspond to which items because the recognized image has only written characters, which has made difficult to check contents and correct recognized results if errors are found.
FIG. 25 is a working screen showing a conventional method of processing a business form. On the left hand of the screen is displayed a recognized image 200 in which only written characters are read with fields dropped out. On the right is displayed a recognized result 202 with a predetermined format prepared by recognizing the written characters on the recognized image 200. However, the recognized image 200 with only written characters from which fields are dropped out, making it difficult to identify which written contents correspond to which items of the recognized results 202. That also makes it difficult to correct recognized results if errors are found after check.
To solve this problem, the conventional method of reading business forms uses a scanner with a special OCR to import simultaneously a dropout and a non-dropout image in a single operation for reading business forms, whereby enabling displaying the non-dropout image and recognized results on a screen, resulting in easy check and correction of them.
(Patent Literature 1)
Japanese Unexamined Patent Application Publication No. Hei 6-68299
The conventional method of importing simultaneously the dropout and non-dropout images, however, causes a problem in that a scanner with a special OCR is required so that the cost is increased because a universal scanner cannot be used. If a universal scanner is used, business forms need reading twice while colors to be read are changed, taking trouble and time with reading operation by the scanner.
Furthermore, the dropout and non-dropout images need storing to check if recognized results are correct on a reproducing screen when questions arise while checking the recognized results. This leads to a problem that two-page image data needs storing per business form so that an image data storage capacity is substantially increased as a whole because business forms read as day-to-day process are enormous in number. The non-dropout image, in particular, appears in color due to color printing made at the dropout parts, so that it requires much more storage capacity than the monochrome dropout image.