Methods for assessing the quality of an image of a compound document are typically used to predict Optical Character Recognition (OCR) accuracy. Research into document image quality assessment has been extensive since scanned document images have been available. However, recently, as mobile devices, such as smart phones and compact digital cameras, are becoming more and more popular, interest into quality assessment methods for document images from these devices has increased.
For example, more and more employees that are traveling out of the office on a business trip are taking pictures of important documents with their smartphone or tablet cameras and are sending these to their company for specific processing. As such, in this scenario, it is critical that the pictures sent by the employees have a high enough quality for subsequent processing such as OCR, document information extraction and classification, manual examination, etc. Therefore, an accurate document image quality assessment method is critical and should be performed on the mobile device.
The known methods include, in general, two steps. Firstly, features which represent the degradation of document images will be extracted. Secondly, the extracted features are linked to the OCR accuracy. The first step may be performed using image sharpness based methods, character based methods, hybrid methods or feature-learning based methods. While the second step may be performed using either learning based methods or empirical methods.
J. Kumar, F. Chen, and D. Doermann, “Sharpness Estimation for Document and Scene Images”, Proc. ICPR, pp. 3292-3295, 2012 describe a sharpness based method that calculates the change in grayscale values, i.e. the disparity, that is observed at an edge of a character in a document image. While this method obtains good results and is fast to calculate, several parameters need to be set in order to obtain the best results.
Other sharpness based methods are described more generally with respect to images, but have also been applied to document images. Examples are R. Ferzli and L. Karam, “A no-reference objective image sharpness metric based on the notion of just noticeable blur (jnb)”, IEEE Tran. on Image Processing, 18, pp. 717-728, 2009; X. Zhu and P. Milanfar, “Automatic parameter selection for denoising algorithms using a no-reference measure of image content”, IEEE Transactions on Image Processing, 19(12), pp. 3116-3132, 2010; N. Narvekar and L. Karam, “A no-reference image blur metric based on the cumulative probability of blur detection (cpbd)”, IEEE Tran. on Image Processing, 20(9), pp. 2678-2683, 2011; and R. Hassen, Z. Wang, and M. Salama, “Image sharpness assessment based on local phase coherence”, Image Processing, IEEE Transactions on 22(7), pp. 2798-2810, 2013. One limitation of these methods is that the different criteria used for quality assessment are very slow to calculate. Furthermore, these methods do not consider the characteristics of document images, and therefore, when they are applied on document images, they may not be valid.
L. R. Blando, J. Kanai, T. A. Nartker, and J. Gonzalez, “Prediction of OCR accuracy,” tech. rep., 1995; M. Cannon, J. Hochberg, and P. Kelly, “Quality assessment and restoration of typewritten document images,” International Journal on Document Analysis and Recognition 2(2-3), pp. 80-89, 1999; and A. Souza, M. Cheriet, S. Naoi, and C. Y. Suen, “Automatic filter selection using image quality assessment,” Proceedings of ICDAR 1, pp. 508-512, 2003 describe character based methods which have been specifically designed for scanned document images, but may also be applied to camera document images. These methods rely on calculating measurements that represent characteristics in which poor OCR is expected, such as fat, i.e. thick, stroke characters which tend to have many touching characters; and/or broken characters which are usually fragmented into small pieces. However, these methods operate on a binarized image with the assumption that the captured colour or grayscale image has been properly binarized, which might not be the case in real situations.
N. Nayef and J. Ogier, “Metric-based no-reference quality assessment of heterogeneous document images”, Proc. SPIE 9402, Document Recognition and Retrieval XXII, 94020L, Feb. 8, 2015; and X. Peng, H. Cao, K. Subramanian, R. Prasad, and P. Natarajan. “Automated image quality assessment for camera-captured OCR.” Proc. ICIP, pp. 2621-2624, 2011 describe hybrid methods which combine image sharpness based methods and character based methods. First, the sharpness of the image is calculated, and after that character-based quality metrics are estimated. Finally, these two measurements are combined to represent the image quality. While these hybrid methods are well suited for predicting OCR accuracy of camera document images, they also suffer from the same disadvantages as the image sharpness based methods and character based methods.
P. Ye and D. Doermann, “Learning features for predicting OCR accuracy,” in 21st International Conference on Pattern Recognition (ICPR), pp. 3204-3207, 2012; and L. Kang, P. Ye, Y. Li, and D. Doermann, “A deep learning approach to document image quality assessment,” in Image Processing (ICIP), 2014, IEEE International Conference on, pp. 2570-2574 describe feature-learning based methods. While these methods are promising, they take a long time to set up as the systems need to be trained by processing numerous images.
After the necessary features have been extracted, the image quality assessment measurement needs to be linked to the extracted features. This can be done using empirical methods that calculate the weighted sum of the extracted features, and proving that this measurement is correlated to OCR accuracy. In particular, the weighting factor for each extracted feature, i.e. a feature that expresses a deterioration level, may be estimated experimentally using the least square method. A disadvantage of these methods is that the weighting factors are tuneable parameters that need to estimated using experiments. As such, these methods may take a long time to set up.
Alternatively, learning based quality assessment prediction methods do not assume that the quality assessment prediction is linearly correlated with the extracted normalized features. Rather, a more complicated mapping function is built to link the multiple dimension extracted features with the quality assessment measurement or OCR accuracy. As with the feature-learning based methods, systems using these methods take a long time to set up as the systems need to be trained by processing numerous images.