The invention relates to methods and apparatus for recognizing unconnected characters, such as alphanumeric characters and the like, and more particularly to methods and apparatus for efficient extraction of features which can be more efficiently utilized by statistical decision trees to recognize electronically scanned characters, and particularly to such methods and apparatus capable of producing features which are generally size invariant and rotation invariant.
In the past, there have been two commonly used approaches to recognition of disconnected characters, one approach being a "structural" approach and the other being a "statistical" approach. In the structural approach, a character is skeletonized by means of a medial axis transformation, well-known to those skilled in the art, and then parts of the character are identified through a spacial analysis of the skeleton. A wide variety of techniques for identifying the parts of the character are known to those skilled in the art, including analysis by means of procedural rule bases, analysis by means of one-dimensional and two-dimensional grammars, and also by means of structural decision tree analysis. The medial axis transformation is computationally very expensive. The computer processing required by the structural methodology gives rise to the identification of meaningless "false" or "noise" structures of the scanned character. In processing such noise structures, which may be very numerous for a single character, complex recognition algorithms are necessary to avoid misclassification.
The statistical approach to character recognition involves extraction of "features" from the pixel data obtained by scanning the character and feeding the extracted features into a statistical decision tree which compares them to a preselected set of features for various predefined character classes and eventually recognizes or rejects the character. Prior character extraction techniques have been confined mostly to mass sampling within a rectangular grid, generation of two-dimensional moments, Fourier transforms of certain boundary properties, the aspect ratio of the character, the thinness ratio of the perimeter length versus the number of dark pixels in the character, and the like. Prior statistical techniques for operating on extracted features generally require that the size and orientation of the characters be known. Additional adequate statistical feature extraction techniques have been devised but they generally require large amounts of computer processing time and memory capacity.
Thus, there is an unmet need in the character recognition art for an improved character feature extraction method and apparatus that produces an efficient set of size invariant, rotation invariant features that can be processed by state-of-the-art decision trees, such as those in accordance with "ISOETRP-An Interactive Clustering Algorithm with New Objectives", by C. Y. Suen and Q. R. Wang, in Pattern Recognition, Vol. 17, No. 2, pp. 211-219, 1984, incorporated herein by reference, to allow efficient statistical character recognition to be rapidly accomplished by one or more state-of-the-art microprocessors, such as the Motorola MC68020 microprocessor.