The present disclosure relates to a character recognition device that recognizes a character in an image by optical character recognition and acquires a character code, a character recognition method, and a recording medium.
In recent years, conversion of image data to electronic data in which retrieval or editing of a character is possible is performed by performing character recognition by optical character recognition (OCR) on image data that has been generated by scanning an original document or the like. In the conversion, in order to create electronic data in the same format as that of the original document, a font (a type face) of a character that is used in the original document has to be specified. The font is non-italic or italic, and there is a case in which the shape of a non-italic character similar to the shape of an italic character in some other font. Therefore, a technique of determining whether or not a font of a character in image data is italic has been proposed.
In a known technology, based on a plurality of conditions, such as a distribution state or the position of the center of gravity of an outline pixel of a character in image data, whether or not an adjacent character is italic, or the like, it is determined whether or not the character is italic. Also, in the known technology, a marginal distribution (a histogram) of a character string in the image data is created, it is determined, if the matinal distribution in which a space between characters is clear is in a perpendicular direction to the character string, that the character is non-italic, and it is determined, if the matinal distribution is in an oblique direction relative to the character string, that the character is italic.