Modern imaging devices such as cameras, camcorders, scanners, and mobile phones are often equipped with image sensors for capturing images. Such imaging devices with image sensors are commonly used to capture images with text. For example, users of mobile phones often take pictures of images with text such as books, menus, advertisements, street signs, billboard signs, news articles, etc. Text information from captured images is then obtained by text detection and recognition methods. The text information thus obtained may be used to identify the images for storage or retrieval.
In recognizing text information, it is generally necessary to first detect a potential text region and determine whether the potential text region contains text. If the text region contains text, the text is then recognized by a text recognition method (e.g., OCR). On the other hand, if the text region does not contain text, the potential text region is discarded.
Unfortunately, conventional text detection methods often erroneously recognize potential text regions, which do not contain text, as text regions containing valid text. Such cases of erroneous detection increases particularly when images include complex non-text backgrounds or patterns. However, even in such cases, text recognition methods are generally applied to the falsely detected text regions, thereby producing unrecognizable results while consuming computing resources.
Therefore, there is a need to reduce erroneous detections of text regions to facilitate more accurate text recognition and save computing resources.