1. Field of the Invention
This invention relates to the field of document image processing systems, and more particularly relates to optical character recognition (OCR) systems. The invention further relates to a method and system which groups similar character patterns when performing the character recognition process.
2. Discussion of the Background
Conventional OCR systems extract features from each character pattern. A pattern matching process is performed using the extracted character patterns and a recognition dictionary containing reference information. Differences between sizes of the characters, the type of fonts used, and noise included in the original image often affect the performance of OCR systems.
A typical image on which character recognition is performed includes alphabetic and numeric patterns which are used many times. The same character should be represented by the same bit-mapped image but in reality, there are some differences attributable to noise including quantization error or sampling (scanning) error. Further, the recognition process performed on each character pattern is complicated and time consuming, as demonstrated by known character recognition systems such as the publication "On the Recognition of Printed Characters of Any Font and Size," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-9, No. 2, March 1987, by Simon Kahan et al, pp. 274-288, which is incorporated herein by reference.