1. Field of the Invention
The present invention relates to a technique for determining metadata from print data.
2. Description of the Related Art
Being equipped with large capacity storage devices, recent digital multi-function peripherals have the function of an image storage server which allows input images to be stored and reused, in addition to a copy function, printing function, facsimile function, and scanning function.
Conventionally, images are stored in specific mailboxes or directories, making it possible to reuse a desired image by specifying an appropriate mailbox or directory and identifying the image by its file name.
However, the large capacity storage devices mounted on digital multi-function peripherals have been increasing in capacity and in the quantity of images which can be stored. With increases in the quantity of images which can be stored, identification of stored images by specifying a mailbox or directory is approaching its limits.
Also, methods for identifying a desired image from the images stored in an image storage server include a method which stores text data as metadata together with the stored images and uses the metadata for searches.
The metadata can be extracted as character information contained in an input image by performing character recognition on the input image (see, for example, Japanese Patent Laid-Open Nos. 2004-215067 and 08-147446).
If print data is provided in the form of PDL (page-description language) data from a PC or the like, the PDL data is rasterized into a raster image and character recognition is performed on the raster image to extract character string information.
There is also a method which obtains metadata by extracting character information (character codes) contained in PDL data without the need for character recognition (see, for example, Japanese Patent Laid-Open No. 08-147446).
However, with a recognition rate being less than 100%, character recognition has the problem of accuracy: there can be recognition errors. Besides, character recognition itself involves computation costs and also has a problem in performance.
On the other hand, the method of obtaining metadata by extracting character string information from PDL data has the following problems.
(1) PDL data may contain character data treating characters as being independent of each other, in which case it is difficult to handle the character data as a continuous character string.
(2) When character images are hidden behind other drawing objects, character information which does not appear in a final raster image may be extracted.
(3) Characters may be drawn as graphics or illustrations in PDL data, in which case it is not possible to extract character information which appears in a final raster image.