The present disclosure relates to a device which extracts information from image data and a method of controlling such a device.
The format (form) of a document may be registered. Then, information may be extracted from image data which includes the registered format. For example, information is extracted from image data which is obtained by scanning a document (original document). Before the extraction of the information, a determination as to whether or not the image data obtained by the scanning agrees with the registered format may be performed. An example of a technology on the extraction of information based on a format as described below is known.
Specifically, a system is known in which the layout of a document where a plurality of cells are arranged according to a certain rule is analyzed with reference to format data that is stored, in which format data specifying the types of information present on the individual cells and an adjacent relationship between the cells is stored, in which the image data of the document is extracted, in which the extraction of a plurality of cells from the image data and the adjacent relationship between the cells are determined, in which the adjacent relationship between the cells in the document is compared with the adjacent relationship between the cells specified by the format data, in which the cells specified by the corresponding format data are identified from among the cells in the document such that the arrangement of information in the document is identified and in which the information contained in the document is recognized according to the arrangement of the identified information.
In a document such as a sheet form, information is written. For example, a name, an address and a telephone number are written. It is convenient that it is possible to automatically extract desired information from the image data of a document and to convert it into data. It is not necessary for an inputting person to manually input the information with a keyboard while seeing the document. In a document, an answer column may be provided. The answer column is an entry column for the selection of an answerer. For example, the answer column includes a check box or a circled symbol. An entry person checks the corresponding box or circles the corresponding symbol. When information selected in the answer column (what type of symbol is entered and in which position the symbol is entered) can be automatically extracted, it is not necessary to check the selected answer for each of sheets in a document.