The invention relates to a method of recognizing sequences of numerals.
Machine-based recognition of numeral sequences is of particular significance, and has long been used for automatically recognizing postal codes in letter-sorting systems.
High success rates are achieved in the recognition of individual numerals. For example, R.A. Wilkinson, J. Geist, S. Janet et al.: The First Census Optical Character Recognition System Conference, National Institute of Standards and Technology, U.S. Department of Commerce, Gaithersburg, USA, 1992, describes such success rates. For recognizing sequences of numerals, therefore, it is standard practice to segment the images obtained from optical sampling of numeral sequences into partial regions for individual numerals, thus attributing the recognition problem to the recognition of individual numbers, see Y. Saifullah, M.T. Manry: Classification-based segmentation of ZIP codes, IEEE Transactions on Systems Man and Cybernetics, Vol. 23, No. 5, pp. 1437-1443, 1993. R. Fenrich, S. Krishnamoorthy: Segmenting Diverse Quality Handwritten Digit Strings in Near Real Time, 4th Advance Technology Conference, USPS, Vol. 1, pp. 523-537, 1990. M. Shridhar, A. Badreldin: Recognition of isolated and simply connected handwritten numerals, Pattern Recognition, Vol. 19, pp. 1-12, 1986. Such segmenting is particularly difficult and costly for handwritten and/or connected numeral sequences.
From methods of recognizing handwritten words, it is already known to avoid segmenting and to effect the recognition process using a sequence of feature vectors that respectively represent properties of a narrow section of the written image, with the aid of an HMM (Hidden Markov Model) identifier, as described in T. Caesar, J.M. Goger, A. Kaltenmeier, E. Mandler: Recognition of Handwritten Word Images by Statistical Methods, 3rd Int. Workshop on Frontiers in Handwriting Recognition, Buffalo, pp. 409,41.6 1993.
An especially significant application of such an HMM identifier is the extraction of the features of the feature vectors.
It is the object of the present invention to provide a method that can be used advantageously with an HMM identifier to recognize numeral sequences.
A method of recognizing sequences of numerals, is provided. A screen image of the numeral sequence is generated. A linear representation of the numeral sequence is derived from the screen image. An upper line, a lower line and a center line are estimated as writing lines for the linear representation of the numeral sequence. The linear representation is subdivided into horizontally-overlapping frames, which are in turn subdivided into a plurality of regions, each encompassing one of the writing lines and overlapping one another vertically. Within each frame, features of line segments extending in the regions are determined separately for the regions as components of a feature vector for this frame. The feature vectors of horizontally-consecutive frames are supplied as a vector sequence to an HMM identifier.
The invention offers an efficient method of recognizing handwritten and/or connected numeral sequences, particularly handwritten postal codes. The extraction of the features for the feature vectors advantageously takes into account the peculiarities of numeral sequences.