The recognition of printed and handwritten characters and words is an important research field with many applications existing in post offices for identifying the postal code from the addresses on the envelopes and sorting the mail, in banks for check processing, in libraries for computerizing the storage of books and texts, and also as reading devices for blind people, etc. Although many methodologies and systems have been developed for optical character recognition (OCR), OCR remains a challenging area. In particular, a good OCR system spends on the average about 2–3 seconds for the recognition of a handwritten character from a handwritten word. An extreme case is the OCR system by Loral, which is based on a very expensive parallel multiprocessor system of 1024 Intel-386 microprocessors, where each 386 CPU processes only one character at a time. There are also many OCR methods based on neural networks, such as the AT&T Bell labs OCR chip, the multiple Neural Networks OCR approach, etc. There are some other OCR methods based on human like recognition. One of them uses a fuzzy graph based OCR approach, with adaptive learning capabilities, which reduces the character dimensions to speed up the recognition process. It scans the text page, detects a character, extracts and recognizes it, produces the appropriate ASCII code, and sends it to the host computer in a few milliseconds simulated average test time. Image Processing and Pattern Recognition (IPPR) are two older research fields with many significant contributions. The recognition and extraction of objects from images is a small sub-field of IPPR. There are many successful methods based on neural nets or graphs to recognize different kind of objects (faces, cars, chairs, tables, buildings, etc) under very noisy conditions.
Recently, attention has been focused on the document processing field due to multimedia applications. Although document processing is an interesting research field, it introduces many difficult problems associated with the recognition of text characters from images. For instance, there are cases where a document can be considered either as text or as image, like images generated by text characters. Also, artistic letters in very old and valuable books, where the starting letter of each paragraph look like a complex image. In some cases, however, the text is handwritten, and the problem becomes more difficult. Several methods have been developed for document processing. Most of these methods deal with the segmentation of a page and the separation of text from images. One prior art method is a “top-down” approach and produces good results under the condition that the examined page can be separated into blocks. Another prior art method is algorithmic “bottom up” process with good performance in several categories of pages with good spacing features, and “non overlapping” blocks. Yet another prior art method exists and is also a “bottom up” process with very good performance especially in long text uniform strings. Still another prior art method exists that separates images from text (typed or handwritten) by maintaining their relationships.