Identification of text regions in papers that are optically scanned (e.g. by a flatbed scanner of a photocopier) is significantly easier (e.g. due to upright orientation, large size and slow speed) than detecting regions that may contain text in scenes of the real world that may be captured in images (also called “natural images”) or in video frames in real time by a handheld device (such as a smartphone) having a built-in digital camera. Specifically, optical character recognition (OCR) methods of the prior art originate in the field of document processing, wherein the document image contains a series of lines of text (e.g. 30 lines of text) of an optically scanned page in a document. Several such methods are described in an article by L. Jagannathan and C. V. Jawahar, in an article “Perspective Correction Methods for Camera-Based Document Analysis”, Proceedings of First International Workshop on Camera Based Document Analysis and Recognition, August 2005, Seoul, Korea. pp 148-154 which is incorporated by reference herein as background.
Techniques of the type described above typically generate too many false positives and/or negatives so as to be impractical when used on natural images containing natural features (such as branches of a tree) mixed with text in various fonts e.g. on traffic signs, store fronts, vehicle license plates, due to variations in lighting, color, tilt, focus, font, etc. FIG. 1 illustrates a bill board 100 in a real world scene in India. A user 110 (see FIG. 1) may use a camera-equipped mobile device (such as a cellular phone) 108 to capture an image 107 (also called “natural image” or “real world image”) of the bill board 100. Camera captured image 107 may be displayed on a screen 106 of the mobile device 108. Such an image 107 (FIG. 1), if processed directly using prior art image processing techniques of the type described in the previous paragraph may result in failure to recognize one or more words in a text region 103 (FIG. 1).
Specifically, use of prior art techniques cause problems in OCR processing a photograph of a scene wherein the bill board 100 is at a higher elevation than user 110, causing perspective distortion in image 107, unless the perspective distortion is corrected e.g. as described in the above-identified article by L. Jagannathan and C. V. Jawahar.
However, prior art methods of the type described above can be too slow and inaccurate for use on real world images with natural features and text (which may or may not be enclosed within a rectangular boundary). Hence, there is a need to perform perspective correction on a natural image or video frame, as described below.