1. Field of the Invention
The present invention relates to a technique of converting a paper document to data that can be electronically searched.
2. Description of the Related Art
Recently, a scanner or a mass storage device such as a hard disk permits easy operations of computerizing a document having been heretofore stored as a paper document, and storing it as an electronic document. In particular, not only a paper document is scanned to be converted to an image data, but also it is generally carried out that character information written therein is read by a character recognition technique to be stored as additional information of an image. As to an electronic document having been stored in such way, by a character string that is included in an original document being input as a search keyword by a user, a desired document can be retrieved at high speed from large amounts of a stored document group.
Furthermore, proposed is the one in which on the occasion when a user makes a search using a search keyword with respect to an electronic document to which such character information is related, the portion at which this search keyword is described on a document image thereof is highlighted so that the user can identify it (for example, Japanese Patent Laid-Open No. 2000-322417). In such manner, since the character portion corresponding to the search keyword is displayed in the highlighted state, even in the case in which there are present in the document a plurality of description points of the same keyword, by switching a page image, a user can efficiently identify description portions of the keyword.
Whereas, there is also a technique that results of character recognition processing are embedded in an image file as a transparent text (character code of a transparent color being specified as a drawing color), and stored in PDF (Portable Document Format) format. When the PDF file having been created in such way is displayed, the transparent text is drawn on the character image in the document image. Thus, when making a keyword search, the transparent text is searched. However, the user cannot see the transparent text itself, so that it appears as if the image were searched. In this manner, based on a file of format that is described with a page description language capable of drawing an image and a character, an image that can be searched with a search keyword can be drawn.
To draw characters in an electronic document using a page description language such as PDF or SVG, character shape information of each character that is a font data is required. However, since generally the size of a font data is large, for the purpose of making the size of an electronic document small, it is generally carried out that the font data is not stored in the electronic document, and in the electronic document, specification of the kind of font is made. In this way, on the occasion of drawing with an application, drawing can be done using the font that is installed in a personal computer.
On the other hand, there are some cases in which a font data is desired to be stored in an electronic document. For example, in the case in which the electronic document having been created with the use of a document creation application is opened using another personal computer, when the font data that is used in this electronic document is not installed in this personal computer, this electronic document cannot be exactly opened. In other words, even in the case in which the electronic document is reproduced using a personal computer or an application in which a specified font data is not installed, if a font data itself is stored in the electronic document, this electronic document can be reproduced exactly.
Furthermore, depending on the application, in some cases, it is preferably essential conditions that a font data for use in drawing of characters is stored in an electronic document. For example, as to the file intended to be stored long-term, after a long period has elapsed, due to changes in OS, the font that is installed as default may be changed. Thus, it is contemplated that the form of storage of a font data is required to be stored.
In addition, depending on the form of a format, there is also a format in which it is essential conditions to store a font data in an electronic document. For example, in the format of XPS (XML Paper Specification), in the case in which a text data is stored, it is necessary to store a font data together as well.
When, however, a font data is stored in an electronic document, the size itself of the electronic document is increased. In case where the file size is increased, a problem exists in that a longer time on the occasion of transmitting the electronic document through a network is required, or that a large storage capacity on the occasion of storage is required.
In an electronic document of a file format of drawing with the use of a font data that is stored in the electronic document in such way, it is desired to prevent the increase of a file size. In particular, in the case in which a scan image, a text data of results of character recognition processing and a font data for text drawing are stored together in the electronic document, it is desired to prevent the increase of a file size. When a font data has to be stored in the electronic document due to restrictions of a format or restrictions of a system, the increase of a file size is likely to be problematic.
Furthermore, on the occasion of highlighting of search results, depending on characteristics of a viewer of displaying a document, there are different ways of highlighting of search results. That is, depending on the performance of highlighting of search results, a character image on the image may be hard to see.
In such situations, in processing of converting a paper document to an electronically searchable electronic document, the following functions are required. That is, it is desired to ensure visibility at the time of highlighting of search while minimizing the size of an electronic document even if a font data to be used is held in this electronic document.