1. Technical Field
The invention relates to a system for recognizing italicized text in an optical character recognition (OCR) system. OCR systems rely on pattern recognition devices (classifiers) for character recognition.
2. Description of the Prior Art
Optical character recognition (OCR) is the process of transforming written or printed text into digital information. Pattern recognition classifiers are used in sorting scanned characters into a number of output classes. A typical prior art classifier is trained over a plurality of output classes using a set of training samples. The training samples are processed, data relating to features of interest are extracted, and training parameters are derived from this feature data. During operation, the system receives an input image associated with one of a plurality of classes. The relationship of the image to each class is analyzed via a classification technique based upon the training parameters. From this analysis, the system produces an output class and an associated confidence value.
Characters with abnormal posture, such as italicized characters, can distort the features used in image recognition. While such characters may be treated separately from the standard posture of the character for the purposes of classification, this greatly increases the number of necessary output classes and unnecessarily retards system performance. Accordingly, it would be useful to develop less resource-intensive methods of detecting text of abnormal posture.