In a field such as banking in which forms are handled, form management systems using a data center are used. In such systems, an electronic document is created by an image scanner apparatus reading a paper form, and the obtained form image data is preserved as evidence information. However, storage capacity for preserving form image data has been enlarged, and methods have been considered for solving such a problem.
To decrease the data amount of form image data, the form image data has been compressed using a technology of compressing a still image, e.g., JPEG (Joint Photographic Experts Group) system. However, there has been a need for storage capacity in form management systems to be considerably enlarged in recent times, and, at present, data compressibility that would be obtained by appropriating a technology of compressing a still image does not provide a sufficient countermeasure.
As an example, the following form image processing technology is known as a technology for decreasing the amount of saved data of a form image. In this technology, first, an image of a form that has been filled out is obtained. Next, an XOR operation process is performed for each pixel between the filled-out form image and an image of a blank form, thereby extracting a difference form image, which is a form image of only a written portion. Subsequently, the difference form image extracted in this way is compressed and saved. After this, in order to restore the original filled-out form image, first, the compressed difference form image that has been saved is read and a decompression process is applied to this read image so as to obtain the difference form image before compression. Next, an XOR operation process is performed for each pixel between the difference image and the blank form image above, thereby restoring the original filled-out form image. That is, in this technology, binary images are processing objects, and a form image of only a written portion, which is extracted by performing an XOR operation between image data of a blank form and image data of a filled-out form, is compressed and saved, thereby decreasing the size of saved data. This technology will hereinafter be referred to as “the first technology”.
Meanwhile, the following form image creating apparatus is known as a technology of creating an image of a blank form. In the form image creating apparatus, first, a plurality of filled-out form images in the same format are obtained. Next, the filled-out form images are aligned. After the aligning, an AND image between the filled-out form images is created. Subsequently, a region surrounding a linking component of black runs (data including a queue of black data) is extracted from the created AND image and the aligned filled-out form images so as to create an image of a blank form. This technology will hereinafter be referred to as “the second technology”. In the second technology, in addition, a rectangular region extracted from an AND image is associated with a rectangular region extracted from a filled-out form image, and, by using the size of the rectangular region, the number of black pixels, or the number of times a rectangular region is extracted, an unnecessary image region is specified and deleted, thereby improving the accuracy in the creating of a blank form image.
A form identifying system is known that determines whether the format of a form image read by, for example, an image scanner is the same as that of a form image that is already registered. In this system, when it is determined whether the format of a form image read by, for example, an image scanner is the same as that of an already registered form image, noise caused by, for example, writing or the imprinting of a seal is removed before identifying so as to recognize the format of the read form image. The operation of this system is divided into two modes, a registration mode and an identification mode. In the operation of the registration mode, first, registered form-image data is read and ruled line characteristics that are used for identifying are extracted. Next, some of the extracted ruled line characteristics, e.g., ruled line characteristics having a ruled line length that is shorter than a threshold, are removed as noise. In addition, so that a form obtained by additionally handwriting a section for the imprinting of a seal or another item in registered form-image data, which is called an item-added form, can be addressed, such an item-added form is also read and a designated item-added portion is registered as an allowable difference amount together with a ruled line characteristic. Meanwhile, in the operation of the identification mode, first, identified form-image data is read, and, using a method similar to the method at the time of registering, a ruled line characteristic is extracted and noise is removed. Ruled line characteristic matching is then performed on all registered form formats, and, in the matching, a format with a difference in ruled line characteristic amount that is smaller than the allowable difference amount is output as a corresponding form. When all of the registered form formats have a difference in characteristic value that is larger than the allowable difference amount, an output indicating a matching failure is made. This technology will hereinafter be referred to as “the third technology”.
Patent document 1: Japanese Laid-open Patent Publication No. 2000-152009
Patent document 2: Japanese Laid-open Patent Publication No. 10-40312
Patent document 3: Japanese Laid-open Patent Publication No. 2006-201965