Detecting and recognizing texts in images have been considered to be important in a variety of applications for computer vision such as image and video retrieval, multi-language translator, and automotive assistance because in many cases texts in images provide significant information.
FIG. 1 is a block diagram illustrating a process of detecting and recognizing a text included in an image.
By referring to FIG. 1, the text in the image, first of all, is detected at a step of S110. A text detection algorithm is an algorithm for detecting a text (or a character) in an image, which may be largely divided into a sliding window-based method and a connected component analysis-based method, depending on methods for extracting text candidates.
If individual characters included in the image are detected through text detection at a step of S120, the individual characters become normalized at a step of S130. The detected text may be included in a bounding box area that minimizes an extra margin(s). By the way, as sizes of the bounding boxes of the individual characters are different, the sizes must be made equally (e.g., 32×32 pixels or 48×48 pixels) and this is called normalization. After the normalization, the characters may be recognized at a step of S140.
There are a variety of conventional normalization methods. For example, a stretching (scaling) method, a replicating method, a constant method, a reflecting method, a wrapping method, etc.
FIGS. 2A to 2E are drawings illustrating a variety of conventional normalization methods.
FIG. 2A shows a stretching (scaling) method for controlling a scale of a bounding box. But this method has a drawback in that it may distort a shape of a character depending on a proportion of a width thereof to a height thereof.
FIG. 2B illustrates a replicating method which copies boundary values and inserts the copied boundary values into a bounding box. This method has a disadvantage in that it includes too much background information in the bounding box. In particular, if there is any noise in the boundary values as shown in FIG. 2B, the noise may appear being emphasized in the bounding box.
FIG. 2C shows a constant method which makes a constant value be included in a bounding box. A problem of this method is that a constant value irrelevant to a character or a background is put into the bounding box.
FIG. 2D is a reflecting method which makes a mirror image of a character be included in a bounding box. Even in this method, a background is included too much in the bounding box.
FIG. 2E is a wrapping method which inserts an image of a character in a bounding box like a repetitive pattern of tile. Also in this method, a background is included too much in the bounding box.
As such, the individual conventional normalization methods had many limitations. Therefore, a method for performing normalization by adding a margin(s) around a bounding box, which includes a detected character, has been suggested.
FIGS. 3A to 3C are drawings explaining limitations of conventional technologies that perform normalization by adding margins around bounding boxes.
Rectangular areas displayed on upper sides, respectively, in FIGS. 3A and 3B show the bounding boxes that include characters detected from an image and those displayed on lower sides show those on which normalization has been performed.
By referring to FIG. 3A, if normalization is performed without any separate additional operations regarding the result of the detected characters, it can be found that characters with narrow widths such as ‘1’ or ‘i’ are almost one-colored. In this case, recognition rates of these characters may be lowered.
FIG. 3B illustrates a case of performing normalization by adding at least one margin around bounding boxes. This case is slightly better than that compared to that in FIG. 3A but it could be found that it still has a problem of character recognition rates being lowered because the characters with narrow widths are almost single-colored.
This problem may appear even in a case where a character in a narrow-type font is recognized. In an example of the narrow-type font such as FIG. 3C, it could be found that the aforementioned problem may occur with regard to characters including not only ‘I’ but also ‘O’. According to the conventional technology, there was a problem that only the absolute widths of characters are considered without any consideration for influences of character fonts.
As such, all the conventional normalization technologies had limitations so that the applicant came to reach the invention on a new normalization method. Specifically, the applicant invented a technology that may increase character recognition rates because it came to allow even a specific character with a narrow width to be recognized as it is by adding at least one margin, which is determined by referring to information on at least one another character related to the specific character detected in an image, around a bounding box that includes the specific character and then performing normalization.