Field of the Invention
This invention relates to a method for processing document images, and in particular, it relates to processing of document images for character/word recognition using artificial neural networks.
Description of Related Art
Artificial neural networks are widely used in the computer vision field to analyze images, including images of documents that contain text. One goal of document image analysis is to extract the text content, referred to as optical character recognition (OCR). Current research in computer vision in the area of document image analysis focuses on the neural network architectures and their optimization techniques, while using raw pixel values of the images as input. The input image is often binary and thus the pixel values carry relatively little information.
Current OCR models based on LSTM (Long Short Term Memory) networks, where image pixel are directly input into the network, are very sensitive to pixel positions in the image columns, and often perform poorly for even slight variations in font (e.g. training the network with images containing only a normal font and testing with images containing a bold version of the same font). This makes it hard to provide a general OCR model that can work well on unseen fonts.
LSTM, a type of recurrent neural network, has been used in various fields. For example, Alex Graves and J{umlaut over ( )}urgen Schmidhuber, Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures, Neural Networks 18.5 (2005): 602-610 (“Graves et el.”), describes a LSTM network and related learning algorithm.
Zhixin Shi, Srirangaraj Setlur and Venu Govindaraju, “Text Extraction from Gray Scale Historical Document Images Using Adaptive Local Connectivity Map”, Proceedings of Document Analysis and Recognition, 2005, describes a method for text extraction from historical document images using adaptive local connectivity map (ALCM). In this method, the grey scale image is converted into an adaptive local connectivity map and a thresholding algorithm is applied to the ALCM to reveal the text line patterns in terms of the connected components.