The present invention relates to a method for character recognition. xe2x80x9cCharacterxe2x80x9d is in this compound neutral regarding number, i.e. separate characters, such as letters and numerals, as well as compositions of several characters, such as words, are here referred to. Both generally used characters and imaginary characters are, of course, included.
There are a plurality of known methods for character recognition, especially for recognition of handwritten characters, which requires especially good interpretation of the character. Several of the known methods are based on the detection of each stroke of the pen when a hand-written character is being formed. Geometric characteristics, such as directions, inclinations and angles of each stroke or part of a stroke, are determined and compared to corresponding data for stored, known characters. The written character is supposed to be the stored character whose geometric characteristics best correspond to the geometric characteristics of the written character. The geometric characteristics are related to an xy-coordinate system, which covers the used writing surface. Such known methods are disclosed in, for instance, U.S. Pat. Nos. 5,481,625 and 5,710,916. A problem in such methods is that they are sensitive to rotation. For example, if one writes diagonally over the writing surface, the method has difficulties in correctly determining what characters are being written.
U.S. Pat. No. 5,537,489 discloses a method for preprocessing the characters by normalising them. The written character is sampled, and each sample is represented as a pair of coordinates. Instead of solely comparing the characters in the coordinate plane, the transformation is determined which best adjusts the written character to a model character. Indirectly, also rotation and certain types of deformations, which the above-mentioned methods cannot handle, are thus taken into account. The transformation is used to normalise the written character. In particular, the character is normalised by being translated so that its central point is in the origin of coordinates, where also the central point of the model character is found, after which the character is scaled and rotated in such a manner that it corresponds to the model character in the best possible way.
A disadvantage of this method is that the normalisation requires computing power and that in any case the choice of model characters has to take place by determining what model character the written character resembles the most.
Another method which certainly can handle rotations is disclosed in U.S. Pat. No. 5,768,420. In this known method, curve recognition is described by means of a ratio that is named xe2x80x9cratio of tangentsxe2x80x9d. A curve, for instance, a portion of a character is mapped by selecting a sequence of pairs of points along the curve, where the tangents in the two points of each pair intersect at a certain angle. The ratio between the distances from the intersection point to the respective points of the pair is calculated and makes up an identification of the curve. This method is in principle not sensitive to translation, scaling and rotation. However, it is limited in many respects. Above all, it does not allow certain curve shapes in which there are not two points whose tangents intersect at the determined angle. It is common that at least portions of a character comprise such indeterminable curve shapes for a selected intersection angle. This reduces the reliability of the method.
An object of the invention is to provide a method for character recognition, which does not have the above-mentioned disadvantages, and which to a larger extent accepts individual styles of handwritten characters and unusual fonts of typewritten characters, and is easy to implement with limited computing power.
The object is achieved by a character recognition method according to the invention comprising the steps of: detecting a union of characters, preprocessing the union of characters, comparing the preprocessed union of characters with one or more template symbols, and applying a decision rule in order to either reject a template symbol or decide that the template symbol is included in the union of characters, the step of preprocessing the union of characters comprising the steps of: representing the union of characters as one or more curves, and parameterising the curve or curves, characterised in that the step of preprocessing the union of characters further comprises the step of forming, regarding various classes of transformation, one or more shapes for the curve or curves, and that the step of comparing comprises the steps of: forming one or more geometric proximity measures, determining for every shape the values of the geometric proximity measures between the shape and correspondingly determined shapes for the template symbols, and that the step of applying a decision rule comprises the step of: selecting one or more template symbols in consideration of the values.
According to the invention, the term xe2x80x9ctemplate symbolxe2x80x9d means, as defined in the claim, everything from a portion of a separate character, the portion being, for instance, an arc or a partial stroke and the character being a letter or a numeral, to compound words or other complex characters. In a similar way, the term xe2x80x9cunion of charactersxe2x80x9d means everything from a separate character to compositions of several characters. The extension of the mentioned terms will be evident from the following description of embodiments.