Financial institutions use character recognition apparatuses which recognize characters on forms, in order to implement automatic entry of characters printed and handwritten on the forms. To enhance accuracy of the character recognition, these character recognition apparatuses recognize character strings on the basis of form definition information that is prepared to define subheadings printed on forms and where and in what order data corresponding to the subheadings is written.
A form has pre-printed characters and characters written by a user. The form definition defines whether a character recognition target item is handwritten or printed. When recognizing characters on a form, a character recognition apparatus checks the form definition, and uses a handwritten character recognition engine if the character recognition target item is defined to be handwritten or uses a printed character recognition engine if the character recognition target item is defined to be printed, to thereby recognize the characters.
As described above, in the case of the character recognition using the form definition, it is previously defined whether character recognition target items are handwritten or printed, so as to enhance the accuracy of the recognition by appropriately performing the recognition according to handwriting or printing. However, in order to set the form definition, target forms need to be acquired in advance, and only limited forms are allowed to be collected. Therefore, the character recognition apparatuses are able to handle only the limited forms. In addition, setting the form definition takes many man-hours because each character recognition target item needs to be set whether it is handwritten or printed.
To eliminate this problem, there has been proposed a method of extracting a character string from image data of a document, calculating the center position of each character in height direction, discriminating based on the regularity of the center positions whether the character string is printed or handwritten, and then recognizing the characters on the basis of the discrimination result (see, for example, Japanese Laid-open Patent Publication No. 2000-181993).
The method disclosed in Japanese Laid-open Patent Publication No. 2000-181993 has a drawback that the regularity of center positions may vary if a character string includes characters with voiced sound mark or contracted sound characters. To overcome this problem, there has been known a method of extracting characters except characters with voiced sound mark and contracted sound characters from a character string, and discriminating on the basis of the regularity of the center positions of the extracted characters whether the character string is handwritten or printed (see, for example, Japanese Laid-open Patent Publication No. 2000-331122).
In addition, there has also been known a method of clipping characters, calculating a plurality of feature values regarding the characters, and determining on the basis of the obtained feature values whether the characters are handwritten or printed (see, for example, Japanese Laid-open Patent Publication No. 2006-92345). The features may include density uniformity, variation in pixel value, linearity of character strokes, heights of characters, uniformity of width, and uniformity of line widths of characters.
The existing character recognition methods determine only whether the character string of a character recognition target item is handwritten or printed. Even character recognition methods using form definition may not be able to process character recognition target items that may include both handwritten and printed characters. Therefore, such character recognition methods invoke both printed and handwritten character recognition engines to recognize characters, which takes a long processing time.