Identification of text regions in papers that are scanned (e.g. by a flatbed scanner of a copier) is significantly easier (e.g. due to upright orientation, large size and slow speed) than detecting text regions in images of scenes in the real world (also called “natural images”) captured in real time by a handheld device (such as a smartphone) having a built-in digital camera. FIG. 1A illustrates a newspaper 100 in the real world in India. A user 110 (see FIG. 1B) may use a camera-equipped mobile device (such as a cellular phone) 108 to capture an image 107 of newspaper 100. Captured image 107 may be displayed on a screen 106 of mobile device 108. Such an image 107 (FIG. 1C) if processed directly by application of prior art techniques used in document processing may result in a failure to classify one or more regions 103, 105 as text (see FIG. 1A), e.g. due to variations in lighting, color, tilt, focus, etc. Specifically, document processing techniques that are successfully used on scanned documents (during Optical Character Recognition, also called OCR) generate too many false positives and/or negatives, so as to be impractical for use on real world images.
Hence, detection of text regions in a real world image is performed using different techniques. For additional information on techniques used in the prior art, to identify text regions in natural images, see the following articles that are incorporated by reference herein in their entirety as background:    (a) H. Li et al. “Automatic text detection and tracking in digital video,” IEEE transactions on Image processing, vol. 9, no. 1, pp. 147-156, 2000;    (b) X. Chen and A. Yuille, “Detecting and reading text in natural scenes,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'04), 2004, pages 1-8;    (c) S. W. Lee et al, “A new methodology for gray-scale character segmentation and recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, October 1996, pp. 1045-1050, vol. 18, no. 10;    (d) B. Epshtein et al, “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition (CVPR) 2010, pages 2963-2970; and    (e) A. Jain and B. Yu, “Automatic text location in images and video frames”, Pattern Recognition, 1998, pp. 2055-2076, Vol. 31, No. 12.
When a natural image 107 (FIG. 1B) is processed to form blocks 103A, 103B, 103C and 103D of text regions (FIG. 1C), some prior art methods of the type described above operate under the assumption that there is no skew (also called orientation, slant or tilt) in text relative to a camera that is used to generate the image, e.g. that text lines are oriented horizontally (or vertically depending on the language) relative to boundaries of the image. Some prior art methods fail when skew becomes significant (e.g. greater than 5 degrees), e.g. due to errors in classifying blocks as being text or non-text (prior to OCR which is performed in a limited manner, only on blocks classified as text). A specific amount of skew, at which prior art methods begin to fail noticeably, depends on the prior art method and the number of errors that are acceptable. Hence, some classifiers may correctly classify a block as text or non-text when skew is small (e.g. within ±5 degrees) relative to the camera, as illustrated in FIG. 1C. But, when skew becomes large (e.g. 30 degrees) as illustrated in FIG. 1D, several prior art classifiers fail to classify the block correctly. So, there is a need to detect and correct skew in a natural image or video frame, prior to classification of regions therein, as described below.