The present invention, in some embodiments thereof, relates to image processing and, more particularly, but not exclusively, to image content recognition.
Optical character recognition (OCR) generally involves translating images of text into an encoding representing the actual text characters. OCR techniques for text based on a Latin script alphabet are widely available and provide very high success rates. Handwritten text generally presents different challenges for recognition than typewritten text.
Known in the art are handwriting recognition techniques that are based on Recurrent Neural Networks (RNNs) and their extensions such as Long-Short-Term-Memory (LSTM) networks, Hidden Markov Models (HMMs), and combinations thereof [S. A. Azeem and H. Ahmed. Effective technique for the recognition of offline arabic handwritten words using hidden markov models. International Journal on Document Analysis and Recognition (IJDAR), 16(4):399-412, 2013; T. Bluche, H. Ney, and C. Kermorvant. A comparison of sequence-trained deep neural networks and recurrent neural networks optical modeling for handwriting recognition. In Statistical Language and Speech Processing, pages 199-210. Springer, 2014; P. Doetsch, M. Kozielski, and H. Ney. Fast and robust training of recurrent neural networks for offline handwriting recognition. In Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on, pages 279-284. IEEE, 2014; H. El Abed and V. Margner. Icdar 2009-arabic handwriting recognition competition. International Journal on Document Analysis and Recognition (IJDAR), 14(1):3-13, 2011; F. Menasri, J. Louradour, A. Bianne-Bernard, and C. Ker-morvant. The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In Proceedings of SPIE, volume 8297, 2012; and F. Stahlberg and S. Vogel. The qcri recognition system for handwritten arabic. In Image Analysis and Processing ICIAP 2015, pages 276-286. Springer, 2015].
Another method, published by Almazán et al. [J. Almazan, A. Gordo, A. Fornes, and E. Valveny. Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis & Machine Intelligence, (12):2552-2566, 2014], encodes an input word image as Fisher Vectors (FV), which can be viewed as an aggregation of the gradients of a Gaussian Mixture Model (GMM) over low-level descriptors. It then trains a set of linear Support Vector Machine (SVM) classifiers, one per each binary attribute contained in a set of word properties. Canonical Correlation Analysis (CCA) is used to link the vector of predicted attributes and the binary attributes vector generated from the actual word.
An additional method, published by Jaderberg et al. [M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. Synthetic data and artificial neural networks for natural scene text recognition. arXiv preprint arXiv:1406.2227, 2014], uses convolutional neural networks (CNNs) trained on synthetic data for Scene Text Recognition.
Shi et al., arXiv preprint arXiv:1507.05717, discloses a neural network, which integrates feature extraction, sequence modeling and transcription into a unified framework. The network consists of convolutional layers, recurrent layers, and a transcription layer.