In recent years, computerization of information has been promoted. According to the computerization, paper documents are scanned by a scanner or the like so as to be changed into image data, and the image data is stored in a storage medium. Meanwhile, it is known that there is a document matching technique for judging whether or not scanned image data matches the image data stored in the storage medium. The document matching technique is utilized in image data processing in various ways.
For example, if image data obtained by the computerization is stored without modification, an amount of data becomes enormous, and therefore a large storage capacity is required. In view of the circumstances, for example, a Patent Literature 1 proposes a technique, which utilizes the document matching technique, for reducing an amount of a plurality of image data, to be stored, having a common format such as a ruled line.
Specifically, in the technique disclosed in the Patent Literature 1, (i) a common part (common format) shared by full page of the documents and (ii) independent parts inherent in each of the documents are extracted from image data (image information) inputted by scanning a plurality of documents, and then the common part and the independent parts are divided and stored. In this case, only one data is stored with regard to the common part of the image data, and separate data is stored with regard to each of the independent parts. This makes it possible to reduce an amount of image data to be stored, regardless of a format.
Specifically, image data of full page is extracted from binarized image data, and a logic operation AND is carried out with respect to the image data of full page so that a common part shared by the full page is extracted. Further, image data of each page is extracted from the binarized image data, and a logic operation EXCLUSIVE OR is carried out with respect to the image data thus extracted and the common part so that independent parts inherent in each page is extracted. The common part and the independent parts thus obtained are encoded and stored.
Citation List
Patent Literature 1
Japanese Patent Application Publication, Tokukaihei, No. 1991-135163 A (Publication Date: Jun. 10, 1991)
Patent Literature 2
WO 2006/092957 (Publication Date: Sep. 8, 2006)
Non Patent Literature 1
“Document image Retrieval and Removal of Perspective Distortion Based on Voting for Cross-Ratios” Tomohiro NAKAI, Koichi KISE, and Masakazu IWAMURA, Meeting on Image Recognition and Understanding (MIRU2005) (Information Processing Society of Japan, Computer Vision and Image Media) pp. 538-pp 545
Note that it is necessary to carry out precise positioning of a plurality of image data, in a case like the Patent Literature 1 where a common part and independent parts are found with the use of a logic operation AND and logic operation EXCLUSIVE OR, respectively, from a plurality of image data (written image information) inputted by scanning a plurality of documents. However, the Patent Literature 1 does not disclose an arrangement in which such precise positioning can be realized. Therefore, according to the technique disclosed in the Patent Document 1, it appears difficult to extract the common part and the independent parts, from a practical standpoint. This poses a serious problem especially when one wishes to extract a document image of a common part from a plurality of document images which have the common part and have different written texts.