1. Field of the Invention
The present invention relates to optical character recognition (OCR), and more specifically to an automatic Arabic text image optical character recognition method that provides for the automatic character recognition of optical images of Arabic text, that is, word, sub-word, and character segmentation free.
2. Description of the Related Art
Arabic Text Recognition (ATR) has not been researched as thoroughly as Latin, Japanese, or Chinese. The lag of research on Arabic text recognition compared with other languages (e.g. Latin or Chinese) may be attributed to lack of adequate support in terms of journals, books etc. and lack of interaction between researchers in this field, lack of general supporting utilities like Arabic text databases, dictionaries, programming tools, and supporting staff, and late start of Arabic text recognition (first publication in 1975 compared with the 1940s in the case of Latin character recognition). Moreover, researchers may have shied away from investigating Arabic text due to the special characteristics of Arabic language.
Due to the advantages of Hidden Markov Models (HMM), researchers have used them for speech and text recognition. HMM offer several advantages. When utilizing HMM, there is no need for segmenting the Arabic text because the HMM's are resistant to noise, they can tolerate variations in writing, and the HMM tools are freely available.
Some researchers use HMM for handwriting word recognition, while others use HMM for text recognition. Moreover, it is well known that HMM has been used for off-line Arabic handwritten digit recognition and for character recognition. Additionally, it has been demonstrated that techniques that are based on extracting different types of features of each digit as a whole, not on the sliding window principle used by the majority of researchers using HMM, have to be preceded by a segmentation step, which is error-prone. Techniques using the sliding window principles and extracting different types of features are also well known. Such techniques have been used for online handwritten Persian characters and for handwritten Farsi (Arabic) word recognition.
Examples of existing methods include a database for Arabic handwritten text recognition, a database for Arabic handwritten checks, preprocessing methods, segmentation of Arabic text, utilization of different types of features, and use of multiple classifiers.
Thus, an automatic Arabic text image optical character recognition method solving the aforementioned problems is desired.