Identification of text regions in papers that are optically scanned (e.g. by a flatbed scanner of a photocopier) is significantly easier (e.g. due to upright orientation, large size and slow speed) than detecting regions that may contain text in scenes of the real world that may be captured in images (also called “natural images”) or in video frames in real time by a handheld device (such as a smartphone) having a built-in digital camera. Specifically, optical character recognition (OCR) methods of the prior art originate in the field of document processing, wherein the document image contains regions that are black or white or several shades of gray.
Document processing techniques, although successfully used on scanned documents created by optical scanners, generate too many false positives and/or negatives so as to be impractical when used on natural images. For example, FIG. 1A illustrates a bill board 100 in a real world scene in India. A user 110 (see FIG. 1A) may use a camera-equipped mobile device 108 (such as a cellular phone) to capture an image 107 (also called “natural image” or “real world image”) of the bill board 100. Camera captured image 107 may be displayed on a screen 106 of mobile device 108. Such an image 107 (FIG. 1A) is normally captured in three colors, such as Red (FIG. 1B), Green (FIG. 1C) and Blue (FIG. 1D), and converted into a grayscale image (FIG. 1E).
When a grayscale image (FIG. 1E) is processed in the normal manner, prior art image processing techniques may result in failure to recognize one or more words. This is because many prior art techniques generate too many false positives and/or negatives so as to be impractical when used on images containing natural features (such as branches of a tree) mixed with text (e.g. in various colors) e.g. on bill boards, traffic signs, store fronts, vehicle license plates, etc. due to variations in lighting, color, tilt, focus, font, etc.
For information on techniques used in the prior art, to identify text regions in color images, see the following articles incorporated by reference herein in their entirety as background:    (a) STROUTHOPOULOS et al, “Text extraction in complex color documents”, Pattern Recognition 35 (2002) 1743-1758;    (b) CHEN, et al. “Detecting and reading text in natural scenes,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'04), 2004, pages 1-8;    (c) JAIN, et al. “Automatic text location in images and video frames”, Pattern Recognition, 1998, pp. 2055-2076, Vol. 31, No. 12; and    (d) EPSHTEIN, et al. “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition (CVPR) 2010, pages 2963-2970, (as downloaded from “http://research.microsoft.com/pubs/149305/1509.pdf”).
Image processing techniques of the type described in such prior art articles appear to be developed under an assumption that text regions have adequate contrast relative to background regions in a grayscale image. Accordingly, use of such techniques on images that do not conform to such an assumption can result in false positives and/or negatives, which thereby render such techniques impractical. Hence, there is a need to improve contrast of a natural image or video frame, for use in text extraction and recognition applications as described below.