1. Field
Although not so limited in its utility or scope, implementations of the present invention are particularly well suited for incorporation in automated mail processing systems to facilitate the resolution of address and other character-containing images captured from mail pieces moving on a transport system, for example. Alternative implementations may be more broadly applied in other applications requiring the resolution of unknown input alphanumeric characters.
2. Brief Description of Related Art
Character recognition techniques are typified by “feature extraction” in which features of an unknown input character are selected for extraction and comparison to a standardized set of features associated with various “ideal” representations of known characters. A degree of analogy between the selected features from the unknown character and the standardized set of features corresponding to each of one or more known characters is then determined. The known character corresponding to the features having the greatest degree of analogy to the extracted features of the input character is then selected as the output character.
Varying among different recognition techniques are the features by which characters are characterized and the methods by which features of an unknown input character are selected for extraction and comparison with standardized data corresponding to known characters. There are also differences among various techniques as to how extracted information is represented and compared to standardized features. Several techniques, for example, involve representing character strokes by vectors having a magnitude (e.g., length in pixels) and a direction, the vectors being directionally classified with respect to a particular scan direction. In at least one such technique, vector representations of a character's strokes are classified according to type (e.g., angle). The number of representative vectors of each classification for a particular input character is then ascertained. Each standardized character having the same number of classified strokes as the vector representation of the input character is retrieved from memory. If more than one standardized character is retrieved for comparison to the input-character representation, then steps are initialized to ascertain which standardized character among the plural standardized characters possesses the highest degree of analogy with the profile of the unknown input character.
Some previous methods are also exemplified by the extraction of skeletal features. For instance, character strokes of a particular width in an input character are represented by a “stick-figure-like” representation in which the strokes are of substantially reduced width. One difficulty invited by such techniques is the signal-damaging effects of noise because, for instance, the thinner the representation of a feature, the closer in size the representation is to the magnitude of noise. Noise effects are discussed further in the summary section below.
In addition, the reliance of current character recognition techniques on a limited number of character feature types (e.g., strokes) limits the bases upon which one character can be distinguished from another. It will be appreciated that the more limited the feature types by which one character can be distinguished from another, the higher the likelihood of character misidentification.
Accordingly, there exists a need for a system of classifying character features and extracting the relevant features from an unknown input character that is resistant to the effects of noise and that increases the character feature types by which characters can be distinguished.