The present invention relates to the generation of data for use in optical character recognition. More particularly, the invention concerns the acquisition of geometrical character feature data and topological information, specifically character bending points, for structural analysis and character classification in an optical character recognition system.
In optical character recognition, selected character data representing character features of interest are employed in a classification procedure that attempts to classify and thus recognize characters based on the character features provided as input. Among the various character features proposed for optical character recognition, character "bending points" have been given substantial recent attention. A bending point represents a topological curvature feature having attributes such as position and acuteness of curvature. High character recognition rates have been achieved when geometrical information including character bending points are used for structural analysis and character classification in an optical character recognition system. For example, it has been reported (H. Takahashi, "A Neural Net OCR Using Geometrical And Zonal-Pattern Features" (October, 1991)) that bending point features can be used to produce superior recognition rates in neural network optical character recognition systems employing back propagation methods.
Historically, the extraction of bending point information from input character data has been problematic. Characters may have multiple bending points and decisions must be made regarding the significance of each bending point feature such that insignificant features are excluded and relevant features are preserved. Complex algorithms have been proposed to identify appropriate extraction points. For example, I. Sekita et al, "Feature Extraction of Handwritten Japanese Characters by Spline Functions of Relaxation Matching", Pattern Recognition, Vol. 21, No. 1, pp. 9-17 (1988), discloses a time consuming spline approximation method. This method is said to require five times the CPU time of prior methods but is assertedly justified by improved character recognition rates.
No proposals have been made to date for a bending point extraction method which provides good recognition rates without undue processing time. Accordingly, given the high recognition rates obtainable with properly selected bending point data, there remains unsatisfied an evident need for a fast yet accurate bending point extraction method that overcomes the recognized deficiencies of existing procedures.