1. Field of the Invention
The present invention relates to an apparatus and a method for extracting circumscribed rectangles of one or more characters in a transplantable electronic document, and more particularly relates to an apparatus and a method for extracting circumscribed rectangles of one or more characters in a transplantable electronic document in a case where one or more fonts need to be replaced.
2. Description of the Related Art
Transplantable electronic documents such as PDF (Portable Document Format) files, PS (PostScript) files, etc., are being widely used in everyday office work. However, it is still difficult to extract specified information in the transplantable electronic documents. For example, Adobe™ Acrobat™ Reader can extract circumscribed rectangles of characters in a PDF file, but the extraction results sometimes are not good enough to make a user feel satisfied. The reason is that if a kind of font in the PDF file cannot be utilized (i.e. lacking character shape measurement information), it is impossible to extract circumscribed rectangles of characters having this font.
Conventional font replacement methods are mainly focused on grids of characters, so visually similar fonts need to be found. However, these kinds of methods are not suitable for extracting circumscribed rectangles of characters. The reason is that extraction of circumscribed rectangles of characters needs to find similar fonts based on the aspect of character shape measurement, not based on the aspect of sense of vision.
U.S. Pat. No. 6,801,673 B2 discloses a method for extracting words in a PDF file. In this patent, words are extracted by finding a word separator (i.e. a space) in text segments or by determining a distance between two neighboring text segments. Under the latter circumstance, if the distance is greater than a predetermined threshold value, the two neighboring text segments are divided into two words. In this patent, the input is a PDF file, and the output is a collection of words included in the PDF file.
U.S. Pat. No. 5,859,648 discloses a font replacement method used for computers. This method is mainly for finding fonts visually similar to fonts prepared to be replaced so as to obtain grids of characters. In this patent, a similar font is found and selected in a font table, and then the entire width of the font is adjusted so that characters having this font cannot be visually changed. The selection of the similar font is based on scores obtained according to similarity based on the aspect of sense of vision without considering similarity based on the aspect of character shape measurement. However, as for extraction of circumscribed rectangles of characters, it is necessary to find similar fonts based on the aspect of character shape measurement, not based on the aspect of sense of vision. Therefore, the font replacement method in this patent does not have effect on extraction of circumscribed rectangles of characters.