1. Field of the Invention
The present invention relates to optical character recognition systems and, more particularly, to methods and apparatuses for identifying character font type in an optical character recognition system.
2. Description of the Related Art
Character recognition systems have become widely available in recent years. A conventional character recognition system first obtains a computerized image of a document, such as by scanning in a paper copy of a document, receiving a document image that has been transmitted by facsimile, or obtaining a document image from a server via a local network. Once the document image is obtained, the portions of the document image corresponding to text areas of the document are then analyzed so as to recognize individual characters in those text areas and form a computer readable file containing character codes (e.g., ASCII character codes) corresponding to the recognized characters. Such a file can then be manipulated in word processing, data compression, or other information processing programs.
Conventional character recognition systems are advantageous because they eliminate the need to retype or otherwise reenter text data of the document. However, if an attempt is made to reproduce (such as by printing out) the document based on the computer readable file of character codes, important visual information present in the original document can be lost if font type is not identified as well. For example, FIG. 1 shows a representative document 1 containing various font types, including block 2 containing sans serif proportionally spaced characters, blocks 7 containing serif proportionally spaced characters, and block 5 containing serif fixed pitch characters. A conventional character recognition system would not recognize the differences among these various font types, but instead would reproduce the entire document in a single font, as such shown in FIG. 2.
The problem is especially relevant where the recognition processed document is printed or where, for example to conserve storage space, only the recognition processed document is retained, and the original scanned document is discarded.
It is therefore an object of the present invention to provide an improved character recognition system that incorporates font type identification techniques.
In one aspect, the present invention uses an image of a character to determine whether the character has a serif font or a sans serif font. A left border is obtained for the image of the character, a determination is made as to whether a gap exists between the character""s left edge and its left border, and the font type of the character is identified as serif if a gap exists and sans serif if no gap exist.
In another aspect, the present invention determines a font type for a group of characters from image data that include images of the characters. Key characters are located in the image data, where each key character matches a character in a pre-defined character set. A left border is then obtained for each key character. A determination is made whether a gap exists between the left edge and the left border of each key character. The font type for characters in the image data then is identified based on the gap determination.
In still another aspect, the present invention determines a font type for a group of characters from image data that include images of the characters. Key characters are located in the image data, where each key character matches a character in a pre-defined character set. The image data are partitioned into image segments. For each image segment a determination is made whether the image segment is fixed pitch or variable pitch. A left border is obtained for each key character in the image segments designated as variable pitch. A determination is made whether a gap exists between the left edge and the left border of each key character in the image segments designated as variable pitch. Finally, the font type for characters in the image data is identified based on the gap determination and the fixed pitch determinations.
This brief summary has been provided so that the nature of the invention may be understood quickly. A more complete understanding of the invention can be obtained by reference to the following detailed description of the preferred embodiment thereof in connection with the attached drawings.