Arabic script is written from right to left, in unicase, and in a cursive style. The Arabic script includes 28 basic letters, and several additional special letters. Recognition of handwritten cursive script, such as the Arabic script, may be difficult.
Scale invariant feature transform (SIFT) is an algorithm to detect and describe local features in images as described in U.S. Pat. No. 6,711,293 entitled “METHOD AND APPARATUS FOR IDENTIFYING SCALE INVARIANT FEATURES IN AN IMAGE AND USE OF SAME FOR LOCATING AN OBJECT IN AN IMAGE”. Speed-Up Robust features (SURF) descriptor is a modified version of SIFT where Haar wavelet responses are computed efficiently using integral images as an approximation to the gradient magnitude and orientation, see Bay, H., Ess, A., Tuytelaars, T., and Gool, L., 2008, “Speeded-Up Robust Features,” Computer Vision and Image Understanding 110 (3), 346-59. A center symmetric local binary pattern (CS-LBP) descriptor has been used to replace the gradient information by the response of the LBP in a computationally efficient manner as described in Heikkilä, M., Pietikäinen, M., and Schmid, C., 2009, “Description of Interest Regions with Local Binary Patterns,” Pattern Recognition, 42 (3), 425-36. In the same manner the center symmetric local ternary pattern (CS-LTP) and weighted orthogonal symmetric local ternary pattern (WOS-LTP) descriptors both use the response of the extended LBP operator named Local Ternary Pattern (LTP) as described in Gupta, R., Patil, H., and Mittal, A., 2010, “Robust Order-Based Methods for Feature Description,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 334-41, New York, USA, and Huang, M., Mu, Z., Zeng, H., and Huang, S., 2015, “Local Image Region Description Using Orthogonal Symmetric Local Ternary Pattern,” Pattern Recognition Letters 54 (March), 56-62. It is worthy to note that the LBP and its extension LTP are closely related to the gradient as these operators essentially evaluate pixel intensity differences. Instead of continuing in evaluating the gradient magnitude and orientation values, the sign of the differences is used. To cope with the large dimensionality of the SIFT descriptor vector, several approaches were proposed. One of the earliest approaches is the principle component analysis (PCA-SIFT) that achieved the discrimination power of SIFT with descriptors of 20 to 36 elements by applying PCA on the gradient magnitudes as described in Yan, K., and Sukthankar, R., 2004, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors,” In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), 2:506-13, Washington, D.C., USA. The SURF algorithm produces a descriptor of 64 elements by computing 4 bins in each of the 16 regions, instead of the 8 bins used in SIFT.
Computing SIFT descriptors for overlapped cells is equivalent to the dense sampling strategy applied frequently in computing Bag of Features (BoF) representations as described in Nowak, E., Jurie, F., and Triggs, B., 2006, “Sampling Strategies for Bag-of-Features Image Classification,” In Computer Vision—ECCV 2006, Springer, Berlin, Heidelberg. The extraction of multi-scale descriptors in dense sampling is described in Bosch, A., Zisserman, A., and Munoz, X., 2007, “Image Classification Using Random Forests and Ferns,” in IEEE 11th International Conference on Computer Vision, 1-8, Rio de Janeiro, Brazil, IEEE, Chatfield, K., Lempitsky, V., Vedaldi, A., and Zisserman, A., 2011, “The Devil Is In The Details: An Evaluation of Recent Feature Encoding Methods,” in The 22nd British Machine Vision Conference, 1-12, and Dundee, R., Aldavert, M., Toledo, R., and Llados, J., 2011, “Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method,” in 11th International Conference on Document Analysis and Recognition (ICDAR 2011), 63-67 and Aldavert, D., Rusiñol, M., Toledo, R., and Lladós, J., 2015, “A Study of Bag-of-Visual-Words Representations for Handwritten Keyword Spotting,” International Journal on Document Analysis and Recognition (IJDAR) 18 (3), Springer Berlin Heidelberg: 223-34, as the multi-scale descriptors provide scale invariance.
Multi-stream Hidden Markov Models (HMMs) have been utilized to develop offline handwriting recognition systems as described in Kessentini, Y., Paquet, T., and Ben Hamadou, A., 2010, “Off-Line Handwritten Word Recognition Using Multi-Stream Hidden Markov Models,” Pattern Recognition Letters 31 (1), Elsevier B. V., 60-70, Ahmad, I., Fink, G., and Mahmoud, S., 2014, “Improvements in Sub-Character HMM Model Based Arabic Text Recognition,” in 14th International Conference on Frontiers in Handwriting Recognition, 537-42. Crete, Greece, and Jayech, K., Mahjoub, M., and Ben Amara, N., 2016, “Synchronous Multi-Stream Hidden Markov Model for Offline Arabic Handwriting Recognition Without Explicit Segmentation,” Neurocomputing 214 (November): 958-71. However, it is noteworthy that these systems assume that the window observation is coming from independent feature streams, where each stream produces features for the entire window. The features of each stream are modeled independently in the HMMs.
The Bayesian-HMM (BHMM)-based handwritten text recognition system was first described in Giménez, A., and Juan, A., 2009, “Bernoulli HMMs at Subword Level for Handwritten Word Recognition,” In Pattern Recognition and Image Analysis, 497-504, Springer Berlin Heidelberg. The state emission probability is modeled by a single multivariate Bernoulli probability density function. The text images are scaled to 30 pixels height while maintaining the aspect ratio and then converted to binary images using Otsu threshold method described in Otsu, Ni., 1979, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Transactions on Systems, Man, and Cybernetics 9 (1): 62-66. The columns of the binary images are taken as the observations. The system is evaluated on isolated English words extracted from IAM database as described in Marti, U.-V., and Horst Bunke, 2002, “The IAM-Database: An English Sentence Database for Offline Handwriting Recognition,” International Journal on Document Analysis and Recognition 5 (1): 39-46. A character recognition error rate of 44.00% was reported by using BHMMs of 10 states. For the sake of comparison, the same database was used to evaluate a traditional HMM-based handwriting recognition system with single multivariate Gaussian probability densities and real-valued observations. A character recognition error rate of 64.20% was reported by using HMMs of 8 states. The single multivariate Bernoulli probability density may be replaced by multivariate Bernoulli mixtures. This improvement dropped down the error rate on the above dataset from 44.00% to 30.90% when 64-mixture states were used. The system is evaluated on a more challenging dataset comprising English text lines extracted from IAM database as described in Giménez et. al. 2009, “Embedded Bernoulli Mixture HMMs for Continuous Handwritten Text Recognition,” In Computer Analysis of Images and Patterns, 197-204. Best recognition error rate of 42.10% was achieved by using of 6-state models and 64 mixtures per state. To capture contextual information in the observations, the sliding window technique was proposed in Giménez, A., Khoury, I., and Juan, A., 2010, “Windowed Bernoulli Mixture HMMs for Arabic Handwritten Word Recognition,” In 2010 12th International Conference on Frontiers in Handwriting Recognition, 533-38, IEEE. A narrow sliding window of few columns is passed over the text line with a stride of one pixel. The columns under the window are concatenated and taken as a single observation. The impact of the sliding window technique was assessed on Arabic handwritten text using institute of communications technology/Technical University Braunschweig (IFN/ENIT) database described in Pechwitz, Mario, Maddouri, S., Märgner, V., Ellouze, N., and Amiri, H., 2002, “IFN/ENIT—Database of Handwritten Arabic Words,” in Colloque International Francophone Sur l'Écrit et Le Document, 129-136, Friborg, Switzerland. Character recognition error rate of 12.30% was achieved by a sliding window of 9 pixels. To reduce the effect of image distortion, a sliding window repositioning technique was described in Alkhoury, I., Giménez, A., and Juan, A., 2012, “Arabic Handwriting Recognition Using Bernoulli HMMs,” In Guide to OCR for Arabic Scripts, 255-72, London: Springer London. The sliding window is translated such that the window center is aligned with the center of mass of the text portion overlaid by the window. The observation is constructed from the columns overlaid by the translated window. To assess the impact of the sliding window repositioning technique in reducing the vertical image distortion, it was applied to the traditional Gaussian-based HMMs recognition system as described in Doetsch, P., Hamdani, M., Ney, H., Gimenez, A., Andres-Ferrer, J., and Juan, A., 2012, “Comparison of Bernoulli and Gaussian HMMs Using a Vertical Repositioning Technique for Off-Line Handwriting Recognition,” in 2012 International Conference on Frontiers in Handwriting Recognition, 3-7, Bari, Italy, IEEE. The system was compared with Long-Short-Term-Memory (LSTM) which is powerful in tolerating vertical image distortion. The experiments were carried out on Arabic IFN/ENIT and French RIMES datasets showed that window translation improves the recognition accuracies of both the HMM- and LSTM-based systems described in Augustin, E., Brodin, J., Carré, M., Geoffrois, E., Grosicki, E., and Prêteux, F., 2006, “RIMES Evaluation Campaign for Handwritten Mail Processing,” in Workshop on Frontiers in Handwriting Recognition, 1-5, La Baule, France.
In order to improve the accuracy of Arabic handwriting text recognition and to improve processing speed, a system was developed to provide improved automated Arabic handwriting recognition performance.
The foregoing “Background” description is for the purpose of generally presenting the context of the disclosure. Work of the inventor, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art against the present invention.