Conventionally, there has been a technique including the steps of: obtaining image data by reading information on a paper-medium document with use of a scanner; generating text data of characters in the image data by performing a character recognition process on the image data; and generating an image file in which the image data and the text data are associated with each other.
For example, Patent Document 1 discloses a technique including the steps of: obtaining PDF image data by reading information on a paper medium with use of a scanner; generating text data by performing a character recognition process on the PDF image data; detecting a margin area of the PDF image data and a color of the margin area; and embedding, in the margin area of the PDF image data, the text data of a color that is the same as the color of the margin area. According to this technique, it is possible to perform a search process or the like with use of the text data while an image quality is not deteriorated. That is, because the text data of the same color as the color of the margin area is embedded in the margin area, the text data is not visible to a user. Accordingly, the image quality does not deteriorate. Further, based on the text data which is embedded in the margin area, information on a document can be extracted by performing, for example, a keyword search.    Patent Literature 1    Japanese Patent Application Publication, Tokukai, No. 2004-280514 A (Publication Date: Oct. 7, 2004)    Patent Literature 2    Japanese Patent Application Publication, Tokukaihei, No. 7-192086 A (Publication Date: Jul. 28, 1995)