1. Field of the Invention
The invention relates to an image reading apparatus which can transfer image information that is obtained by reading an original to a server apparatus through a network and relates to the server apparatus, an image processing system, an image processing method, a storing medium in which a computer-readable program has been stored, and the program.
2. Related Background Art
In conventional various businesses, a form (a template in which various fields to fill in and graphic objects are defined) of paper is generally used for transferring and storing information. An effort to raise business efficiency by converting the form into electronic data so that it can be handled by a computer system has been made.
When the form of the paper is converted into the electronic data, a bit map image is formed by using a scanner. However, if the form is handled as an image as it is, a capacity is large and it is difficult to use it again. Therefore, as a system for converting the form data into text data and enabling the information to be easily handled, there is an OCR (Optical Character Recognition) system.
In the OCR system, a method of raising a character recognition ratio by using a fact that a regular pattern of the form has been predetermined is used. For example, if information showing at which position on a page a character of which pattern has been written is prepared as a template, a width of selection upon character recognition is narrowed, so that the character recognition ratio is remarkably improved.
If one form is constructed by one page, processes using the template are easy. However, actually, one form is generally constructed by a plurality of pages whose writing positions are different. The templates of the plurality of pages are prepared per form and several processing methods are used.
Specifically speaking, there are the following methods: (1) only one form is read out from a scanner and the form data is converted into the text data on a form unit basis; (2) the forms are scanned in a lump from a document feeder, the page number is automatically discriminated on a page unit basis (generally called form recognition), and the optimum template is selected; and (3) generally, a plurality of forms are read out in a lump from the document feeder; and the like. In the case of the above method (3), in many cases, all of the forms have the same format (JP-A-2004-005268). However, in the above method (1), although the template of the corresponding page can be certainly used, since the operation of the scanner has to be executed on a form unit basis, it takes a processing time and efforts.
In the above method (2), by using the document feeder, a large quantity of documents can be read out in a lump and the troublesomeness of the processes can be reduced. However, in the form recognition, since the optimum template is selected from all of the prepared templates of the pages, the page number is liable to be erroneously recognized, the processes become very heavy, and it takes a long time for the processes.
If an original is not prepared as paper but the original has been prepared as a PDF file or an application file and the creator of the form original prints by himself, there is a case where it is impossible to discriminate whether the original is formed as a form original in either a simplex printing mode or a duplex printing mode in dependence on an environment of the creator. In such a case, In the above method (3), even in the form of the same format, it cannot be simply subjected to the OCR process but it is necessary to execute a preparation such as removal of a white page or the like. Further, in the above method (3), there is also a problem that in the batch reading operation, if a certain form is partitioned on the way, when it is intended to collect the OCR processes on a form unit basis, the user has to wait until the next batch reading process and the OCR process are finished.