In recent years, since scanners and large-capacity storage devices such as hard disks and the like have prevailed, documents that have been saved as paper are scanned and saved as digital documents. In this case, image data obtained by scanning a paper document undergoes character recognition processing to read text information described in that document, and the text information is saved in association with the image. The user can search digital documents associated with text information using search keywords. In order to search for a desired document fast from large quantities of saved document groups in this way, it is important to allow a keyword search even for scan images.
For example, Japanese Patent Laid-Open No. 2000-322417 describes the following technique. That is, when the user searches for a digital document associated with text information using search keywords, text parts that describe the search keywords on that document image are highlighted to be identifiable for the user. Since the text parts corresponding to the search keywords are highlighted, if the document includes a plurality of description parts of identical keywords, the user can efficiently identify the description parts by switching page images.
On the other hand, a technique that embeds results of character recognition processing in an image file as transparent text (character codes designated with a transparent color as a rendering color) and saving the image file in a PDF (Portable Document Format) format is also available. Upon displaying a PDF file generated in this way, transparent text is rendered on character images in a document image. Therefore, upon conducting a keyword search, transparent text is found, but the user cannot see the transparent text itself, and it seems as if an image were found. In this manner, an image which is searchable using search keywords can be rendered based on a file of a format described using a page description language that allows to render an image and text.
Rendering of text in a digital document using a page description language such as PDF, SVG, or the like requires character shape information of each character, i.e., glyph of font data. However, since font data normally has a large size, it is a common practice not to store font data in a digital document and to only designate font types in the digital document. In this way, an application can render text data using fonts installed in a personal computer.
On the other hand, it is often desired to store font data in a digital document. For example, when a digital document generated by a document generation application is to be opened by another personal computer, if font data used in that digital document are not installed in the personal computer, the digital document cannot be opened accurately. In other words, even when a personal computer or application in which no designated font data are installed reproduces a digital document, if font data themselves are stored in the digital document, that digital document can be accurately reproduced.
In some cases, it is desired to store font data used to render characters in a digital document as an indispensable condition depending on use applications. For example, as for files which are to be saved for long terms, fonts installed as defaults may be changed due to a change in OS after an elapse of a long period of time. Hence, it is desired to store font data as an indispensable format.
Some formats have an indispensable condition to store font data in a digital document. For example, in an XPS (XML Paper Specification) format, font data need to be saved together upon saving text data.
However, when font data are stored in a digital document, the size of the digital document itself increases. When the file size increases, it takes much time to transmit a digital document via a network, or a large storage size is required to store the document.
In this manner, in a digital document of a file format that renders characters using font data stored in the digital document, it is desired to prevent the file size from increasing. Especially, when a scan image, text data as a result of character recognition result, and font data used to render text are stored together in a digital document, it is desired to prevent an increase in file size. When font data need to be stored in a digital document due to restrictions of a format, system, and the like, an increase in file size often readily poses a problem.