A document image may contain a variety of content, for example, text, line art and pictures. Exemplary line art may comprise graphics, figures and other types of line art, and pictorial content may comprise either a continuous-tone picture or a half-tone picture. Additionally, a document image may comprise local and page color background regions, text characters with various sizes, colors and fonts and tabulated data.
The content of a digital image may have considerable impact on the compression of the digital image, both in terms of compression efficiency and compression artifacts. Pictorial regions in an image may not be efficiently compressed using compression algorithms designed for the compression of text. Similarly, text images may not be efficiently compressed using compression algorithms that are designed and optimized for pictorial content. Not only may compression efficiency be affected when a compression algorithm designed for one type of image content is used on a different type of image content, but the decoded image may exhibit visible compression artifacts.
Additionally, image enhancement algorithms designed to sharpen text, if applied to pictorial image content, may produce visually annoying artifacts in some areas of the pictorial content. In particular, pictorial regions containing strong edges may be affected. While smoothing operations may enhance a natural image, the smoothing of text regions is seldom desirable.
Copiers, scanners and other imaging devices may use text segmentation when performing content-specific processing and compression on document, and other digital, images. Exemplary content-specific processing may comprise differential filtering and color enhancement. Exemplary content-specific compression may comprise layered compression schemes, where the contents of a document image are segmented into a high-resolution foreground layer and a lower resolution background.
Detection of text in digital images may be used so that content-type-specific image enhancement methods may be applied to the appropriate regions in a digital image. The detection of regions of a particular content type in a digital image may improve compression efficiency, reduce compression artifacts, and improve image quality when used in conjunction with a compression algorithm or image enhancement algorithm designed for the particular type of content. Additionally, text detection may be performed prior to optical character recognition (OCR) and other image analysis tasks.
Robust text detection techniques that are able to reject non-text content while retaining actual text components may be desirable.