Imaging devices such as cameras, camcorders, scanners, mobile phones and other user equipment may utilize an image sensor configured to capture digital media, for example digital images and/or video.
The captured digital media may contain text information. The text information may be captured from within a document or captured from within a natural scene, for example, users of mobile phones often take pictures and/or video of scenes that contain text information such as street signs, billboard signs, etc.
In processing the digital media, it may be desirable to localize the areas of the digital media that contain the text information such that the areas that contain the text information are processed differently than natural scene areas of the digital image. For example, it may be desirable to apply different image processing techniques to the text information rather than applying conventional image-processing algorithms, such as denoising, sharpening and super-resolution, which may result in undesired artifacts within the text information. Further, it may be desirable to localize areas of the digital media that contain the text information such that the imaging device may optically zoom in on the text information to increase clarity thereof.
Localizing areas within a natural scene may be difficult because a natural scene exhibits a wide range of imaging conditions, such as noise and blur and may not be as formally structured as a scanned document, for example, text information in natural scenes may be in random poses, colors, sizes and shapes. Therefore, it may be difficult to determine areas in a digital image of a natural scene that contain the text information.
The difficulty in detecting text information in natural scenes may be exacerbated when the text information is relatively small in size (e.g. 3-5 pixels in height). Conventional methods to detect the text information contained within digital media may be based on either local features or global features of the digital media. However, these conventional methods may not be well suited to detect small text as conventional methods typically extract relatively large visual features, such as the width of the strokes from the letters. These large visual features may not be evident in small text whose font size is 3-5 pixels in height.