In many cases, as texts in an image provide significant information, detecting and recognizing scene texts has been considered importantly in a variety of applications for computer vision such as image and video retrieval, multi-language translator, and automotive assistance.
A scene text detection algorithm as an algorithm for detecting a text (or a character) in an image may be largely divided into a sliding window method and a connected component analysis method depending on a scheme for extracting text candidates.
The sliding window method is a technique for detecting texts of a scene image by shifting a window in multiple scales at all locations of the image. Thanks to thorough searches for an inputted image, this technique has the advantage of high recall rates showing how many text regions are detected. Contrarily, it cannot avoid too many calculations caused by scanning the window thoroughly and may cause a lot of false positive results due to a great number of text candidates. Accordingly, it is inappropriate for real-time applications. The sliding window method has been introduced in an article entitled “Detecting and reading text in natural scenes” in Proc. CVPR 2004 on pages 366-373 in 2004 by X. Chen and A. L. Yuille, etc.
As such, as the sliding window method requires a lot of calculations, the connected component analysis method is recently used more frequently. It is a method for extracting text candidates as a set of pixels which share similar text characteristics from an inputted image and performing work for refining text candidates to suppress non-text candidates. The stroke width transform (SWT) and the maximally stable extremal regions (MSER) are representative techniques of the connected component analysis method. These methods provide state-of-the-art performance with regard to the detection of the scene texts. The connected component analysis method has been introduced in an article entitled “Detecting text in natural scenes with stroke width transform” in Proc. CVPR 2010 on pages 2963-2970 in 2010 by B. Epshtein, E. Ofek, and Y. Wexler, etc.
However, general constraints used to refine text candidates under the connected component analysis method have drawbacks of being limitedly evaluated upon detecting several true texts and consequentially showing low recall rates.
Accordingly, text detection techniques that may show high recall rates while making optimal performance are being required upon detecting texts from images.