In many cases, as texts in an image provide significant information, detecting and recognizing scene texts have been considered important in a variety of applications for computer vision such as image and video retrieval, multi-language translator, and automotive assistance.
A scene text detection algorithm as an algorithm for detecting a text, i.e., a character, in an image may be largely divided into a sliding window method and a connected component analysis method depending on a scheme for extracting text candidates.
The sliding window method is a technique for detecting texts in a scene image by shifting a window in multiple scales at all locations of the image. Thanks to thorough searches for an inputted image, this technique has the advantage of high recall rates, i.e., rates showing how many text regions are detected. Contrarily, it cannot avoid too many calculations caused by scanning the window thoroughly and may cause a lot of false positive results due to a great number of text candidates. Accordingly, it is inappropriate for real-time applications. The sliding window method has been introduced in an article entitled “Detecting and reading text in natural scenes” in Proc. CVPR 2004 on pages 366-373 in 2004 by X. Chen and A. L. Yuille, etc.
As such, as the sliding window method requires a lot of calculations, the connected component analysis method is recently used more frequently. It is a method for extracting text candidates as a set of pixels which share similar text characteristics from an inputted image and refining the text candidates to suppress non-text candidates. The stroke width transform (SWT) and the maximally stable extremal regions (MSER) are representative techniques of the connected component analysis method. These methods provide state-of-the-art performance with regard to the detection of the scene texts. The connected component analysis method has been introduced in an article entitled “Detecting text in natural scenes with stroke width transform” in Proc. CVPR 2010 on pages 2963-2970 in 2010 by B. Epshtein, E. Ofek, and Y. Wexler, etc.
However, general constraints used to refine text candidates under the connected component analysis method have drawbacks of being limitedly evaluated upon detecting several true texts and consequentially showing low recall rates.
Accordingly, the inventor of the present invention suggested the technology of detecting a text with high recall rates while showing optimal performance in one image through U.S. patent application Ser. No. 15/014,441.
However, if more than a certain percentage of the detected text candidates in one image are classified as weak texts, it is difficult to determine whether the classified weak texts have similar characteristics to strong texts and therefore text tracking in use of hysteresis would be difficult in the frame.
Therefore, the inventor of the present invention came to invent the technology that can classify one or more weak texts in a specific image as one or more strong texts by referring to information on at least one text classified as the strong text from another image related to the specific image.