1. Field of the Invention
The present invention relates to an image processing apparatus, especially to an image processing apparatus which divides an image data into drawing elements (objects), extracts metadata, most of which are character information, from the divided drawing elements, and stores the character information after associating the character information with corresponding drawing element.
2. Description of the Related Art
In recent years, a digital multi function peripheral (hereinafter abbreviated as MFP) has become able to have a large capacity HDD. Conventionally, the MFP has a copy function, a PDL print function, a FAX function, a scanned image transmitting function, and so on. By mounting a large capacity HDD, the MFP has become able to provide a so-called BOX function in addition to these functions. The BOX function is a function to store and accumulate image data obtained by scanning an original document and image data rendered for print into a HDD in the MFP. In this way, a user can use the MFP as if it is an image filing apparatus.
The user can perform various operations such as transmitting the image data stored by using the BOX function, printing the image data, or combining the image data with other image to output. In this case, for the convenience of the user, it is necessary to efficiently search for a desired image data. Therefore, a technique which provides character information extracted from image data to the image data as a search index for the search and utilizes it to improve searchability is developed.
The above related art aims mainly at a scanned image data. In a process flow of the related art, first, an area considered to be a character block is cut out as a character area from an image data obtained by scanning an original document. An OCR (Optical Character Recognition) processing is performed on an image data in the character area obtained in this way. Character codes obtained by this OCR processing (character recognition processing) are stored as character information for the search along with the original image.
On the other hand, there is a related art in which a document created by an application on a computer is stored in a filing apparatus. Japanese Patent Laid-Open No. H08-147446 (1996) discloses a technique which converts (renders) a document into PDL data via a printer driver, extracts character codes from a character object in the PDL data, and stores the character codes as character information along with the rendered image data.
On the other hand, there are some types of applications which do not output the character object as character codes. Specifically, there are application examples in which data drawn as characters by a user on the application is not converted into a character object when the data is converted into PDL data, and even when character codes can be extracted, the character codes become a character object difficult to use as a search index. In these cases, an effective search index cannot be provided to an image data only by extracting character codes from a character object in the PDL data.
Such cases will be illustrated below.
A character of a large sized font may be represented and treated as Path data which is a set of line segments on an application, driver, or PDL.
When a printer does not have a specified font of a character or a character is a decorated character with gradation and the like much applied, the character may be treated as an image data to be PDL data.
A character code may not identify a visual data by itself (visual data varies depending on a type of font).
When a character string of an original character is divided into single characters to be a drawing command, each character is obtained as a different character data from each other, so that a meaningful character data cannot be obtained.
Furthermore, a decorated character and a large sized character which easily cause problems like the above often have important meanings, so that losing character information of the character becomes a cause of preventing providing an effective metadata.