Optical Character Recognition (OCR) systems are known in the art. They convert the image of printed text into machine-readable code by using a character recognition process. In an OCR system, the images of what could be characters are isolated and a character recognition process is used to identify the character.
A character recognition process, such as the one shown in FIG. 1, comprises generally:
(a) A feature extraction process 102 that extracts a feature vector from the character input image 101.
(b) A classification process 103 that compares the feature vector with models 104 and assigns the feature vector to a class of a given set of classes, which is the output 105.
In state of the art OCR systems, the classification process needs not only to output one class but also alternative classes and confidence levels. The OCR system comprises then a contextual decision system that will use that information along with linguistic or typographic contextual information to output the best recognition text.
The set of features that are calculated describes the shapes of the characters to recognize. They should be discriminant, insensible to character deformation and additional noises and give reliable confidence levels.
On the other hand, some character recognition processes are based on template matching but those character recognition processes can only recognize text written in a limited number of fonts. However, the confidence levels given by those character recognition processes are normally more reliable than feature-based character recognition systems.