Document scanners are widely used to capture text and transform it into electronic form for further processing. As camera resolution has risen in recent years, text capture through digital cameras is becoming an alternative choice. Digital cameras are portable and offer face-up, non-contact, near-instantaneous image acquisition, but suffer from image quality problems resulting from the wide range of conditions in which they may operate. One of the most severe problems is that the cameras shoot documents with arbitrary perspectives and bring perspective distortions to captured images. The presence of perspective is distracting to human readers and makes image-analysis operations, such as optical character recognition (OCR), layout analysis and compression, slower and less reliable.
Thus, it is desirable to automatically correct the perspective-distorted image to produce an upright view of the text regions for an image.
Although the geometry of rectification is fairly mature, such as those methods proposed by R. M. Haralick in “Monocular vision using inverse perspective projection geometry: analytic relations”, Proceedings of the IEEE Computer Vision and Pattern Recognition Conference 1989; 370-378, few rectification techniques have been reported in the literature for perspective-distorted document images through digital cameras. In the article “Recognizing text in real scenes, International Journal of Document Analysis and Recognition” 4 (4) (2002) 243-257, by P. Clark and M. Mirmehdi, the quadrilaterals formed by the borders between the background and plane where text lines are utilized to get an upright view of perspective-distorted text. After the extraction of quadrilaterals using the perceptual grouping method, a bilinear interpolation operation is implemented to construct the corrected document image. As the algorithm depends heavily on the extraction of quadrilaterals, the existence of a high-contrast document border (HDB) within the captured document image is a must for correct rectification.
Instead of using document borders that do not always exist in a real scene, M. Pilu has proposed a new rectification approach in the article “Extraction of illusory linear clues in perspectively skewed documents,” Proceedings of the IEEE Computer Vision and Pattern Recognition Conference 2001; 363-368 based on the extraction of illusory clues. To extract the horizontal clues, the character or group of characters is transformed into a blob first and a pairwise saliency measure is computed for pairs of neighboring blobs, which indicates how likely they belong to one text line. After that, a network based on perceptual organization principles is transversed over the text and horizontal clues are calculated as the salient linear groups of blobs. Though working well on the extraction of horizontal clues, the method cannot extract enough vertical information.
In the article “Perspective estimation for document images,” Proceedings of the SPIE Conference on Document Recognition and Retrieval IX 2002; 244-254 by C. R. Dance, a distorted document image is rectified using two principal vanishing points, which are estimated based on the parallel lines extracted from the text lines and the vertical paragraph margins (VPM). The main drawback of this approach is that it works only on a fully aligned text, as it relies heavily on the existence of VPM features. In addition, the means to extract parallel lines also is not clarified.
In the article “Rectifying perspective views of text in 3D scenes using vanishing points,” Pattern Recognition 36 (2003) 2673-2686 by P. Clark and M. Mirmhedi, two vanishing points are estimated based on some paragraph formatting (PF) information. More specifically, the horizontal vanishing point is calculated based on a novel extension of a 2D projection profile and the vertical vanishing point based on some PF information, such as VPM or a text line spacing variation, when paragraphs are not fully aligned. However, to implement such a rectification method, well-formatted paragraphs are required.
Nowadays, several applications that can rectify the perspective distorted document image have been brought on the market, for example, Casio EX-Z55 and Wintone Huishi. However, both of them are based on HDB extraction, and the results are not reliable due to the lack of sufficient border information.