In a system for scanning a form on a paper and conducting an OCR (Optical Character Recognition) process with respect to each of items of the form, the system is required to accurately acquire location information of each field of the form beforehand. Therefore, in general, a user displays an image of the form on a screen of a display unit and indicates a location of each field, so as to register field location information of the form to the system beforehand.
However, such operations takes time. Japanese Patent No. 3586911 and Japanese Patent No. 3001950 disclose methods of searching for rectangle information by using an image process with respect to an image of a form. It is realized to automatically extract each rectangle (each field of the form) based on image data. On the other hand, there are problems in these inventions disclosed in the Japanese Patent No. 3586911 and the Japanese Patent No. 3001950. Since each rectangle is extracted based on the image data, for example, there are errors in that a letter is misread as a closing line and adversely a closing line is misread as a letter in the form. In a case of embedding data of a background image and a tint block on a background of the form, there are problems in that the background image and the tint block are misread as closing lines. Accordingly, these problems may degrade rectangle extraction accuracy. Moreover, in a case of drawing a closing line with a dotted line, it is difficult for an image process to recognize the dotted line as line segments of a single line. Thus, the image process may fail to extract a rectangle drawn by the dotted line. Furthermore, since the form is imaged by a pixel unit of the image, this imaging process is discrete. A difference may occur between a discrete space created by a certain discrete parameter and another discrete space. That is, a difference occurs between coordinates of a rectangle in a condition of imaging the form and another condition (in an image received from a scan, a fax, and a like with a different resolution), and it is difficult to always read the form accurately.
In general, the form is originally created by using an application software such as Microsoft® Word and Excel, Adobe® Acrobat, and a like. Information concerning locations of letters and closing lines is retained as vector information in an electronic file of the form. Closing line information and letter information is clearly distinct in the electronic file. It is possible to disperse the vector information by using discrete parameters without errors, and it is possible to read rectangles without errors even if an image is generated in any process circumstance.
Japanese Laid-open Patent Application No. 2005-190439 discloses to extract line segments from the vector information included in an electronic document and conduct an area indication. The vector information in the electronic document is decomposed into a vertical line segment and a horizontal line segment, and the area is indicated by using these line segments on a screen.
In the Japanese Laid-open Patent Application No. 2005-190439, areas are indicated or rectangles are extracted by extracting line segments. Only simple line segments and rectangles can be processed. Thus, it is impossible to accurately indicate the areas or extract the rectangles for a special case concerning the vector information. For example, in an actual electronic file, the special case is a case in which an object seen as a line segment on the screen is actually depicted as a rectangle drawing instruction in the vector information or vice versa an object seen as a rectangle on the screen is actually depicted as a line segment drawing instruction. Moreover, an error occurs to line segment information being extracted in the special case, and it may be determined that there is no connection. These problems may occur as a consequence of the vector information. In order to realize a higher accurate area extraction, it is necessary to solve the above-described problems.