1. Field of the Invention
The present invention relates to a character recognition apparatus and character recognition method which perform character recognition for a character string written in a plurality of character entry boxes written on a sheet of paper such as a business form.
2. Description of the Related Art
As an example of a sheet of paper as a character recognition target for a character recognition apparatus, a business form on which a plurality of character entry boxes are written in advance is presented. Each character written on such a business form does not necessarily fall within a corresponding character entry box. Part of the character written in a given character entry box may protrude from the character entry box, and the protruding portion may be in contact with part of another character in an adjacent character entry box. In such a case, if character extraction processing is performed for the character in each character entry box and character recognition processing is performed in the character recognition apparatus, a correct character recognition result often cannot be obtained.
If “00” is contained in a written character string, a ligature (e.g., a horizontal line written over 0 and 0 when they are consecutively written) protruding from a character entry box is often contained in the character string. A technique of detecting such a ligature from a positional relationship with a character entry box and separating it from the box has begun to be used. However, if characters are in contact with each other in a complicated manner, such a technique cannot work well. In addition, if a character entry box changes in shape or size, a method or algorithm for determining a cutting place must be greatly rewritten in accordance with the change.
There are various techniques for character extraction processing for a character string written in a plurality of character entry boxes. For example, Jpn. Pat. Appln. KOKAI Publication No. 2000-113101 (FIG. 1 and the like) discloses a technique of obtaining the longitudinal and transverse projections of a character string, comparing them with thresholds to determine whether there is a continuous line between characters, and performing character extraction processing in accordance with the determination result.
The above reference shows the technique for detecting a correction line. If, therefore, characters are in contact with each other in a complicated manner, this technique cannot perform accurate character extraction.
Under the circumstances, it is desired to provide a technique which realizes accurate character extraction for a character string written in a plurality of character entry boxes written on a sheet of paper.