The invention relates to text that is contained in transmitted pages. More particularly, the invention relates to a method and apparatus for segmenting a scanned page into text and non-text areas.
Text or pictorial images are often replicated or transmitted by a variety of techniques, such as photocopying, facsimile transmission, and scanning of images into a memory device. The process of replication or transmission often tends to degrade the resulting image due to a variety of factors. Degraded images are characterized by indistinct or shifted edges, blended or otherwise connected characters, and distorted shapes.
A reproduced or transmitted image that is degraded in quality may be unusable in certain applications. For example, if the reproduced or transmitted image is to be used in conjunction with a character recognition apparatus, the indistinct edges and/or connected characters may preclude accurate or successful recognition of characters in the image. Also, if the degraded image is printed or otherwise rendered visible, the image may be more difficult to read and less visually distinct.
There are several approaches to improving image quality. One known resolution enhancement algorithm provides template matching. Template matching attempts to match a line, curve pattern, or linear pattern and then tries to find the best way to reconstruct it within the available printing resolution.
Other methods for text enhancement come from the area of Optical Character Recognition (OCR). The main purpose of OCR is to isolate the characters within a block of text from one another. Such methods are more related to morphological filters that repetitively perform thickening and thinning and opening and closing to get the desired character shape.
J. Shiau, Z. Fan, and R. J. Clark, Detection And Rendering Of Text In Tinted Areas; U.S. Pat. No. 5,852,678 (Dec. 22, 1998) and related European Patent Application No. EP 0810774, Detection And Rendering Of Text In Halftone Tinted Areas, (Dec. 12, 1997) disclose a method and apparatus that improves digital reproduction of a compound document image containing half-tone tint regions and text and/or graphics embedded within the half-tone tint regions. The method entails determining a local average pixel value for each pixel in the image, then discriminating and classifying based on the local average pixel values, text/graphics pixels from half-tone tint pixels. Discrimination can be effected by calculating a range of local averages within a neighborhood surrounding each pixel; by calculating edge gradients based on the local average pixel values; or by approximating second derivatives of the local average pixel values based on the local averages. Text/graphics pixels are rendered using a rendering method appropriate for that type of pixel; half-tone tint pixels are rendered using a rendering method appropriate for that type of pixel.
L. L. Barski and R. S. Gaborski, Preprocessing Of Dot-Matrix/Ink-Jet Printed Text For Optical Character Recognition, U.S. Pat. No. 5,212,741 (May 18, 1993) disclose a method and apparatus for processing image data of dot-matrix/ink-jet printed text to perform OCR of such image data. In the method and apparatus, the image data are viewed for detecting if dot-matrix/ink-jet printed text is present. Any detected dot-matrix/ink-jet produced text is then pre-processed by determining the image characteristic thereof by forming a histogram of pixel density values in the image data. A 2-D spatial averaging operation as a second pre-processing step smooths the dots of the characters into strokes and reduces the dynamic range of the image data. The resultant spatially averaged image data is then contrast stretched in a third pre-processing step to darken dark regions of the image data and lighten light regions of the image data. Edge enhancement is then applied to the contrast stretched image data in a fourth pre-processing step to bring out higher frequency line details. The edge enhanced image data is then binarized and applied to a dot-matrix/ink jet neural network classifier for recognizing characters in the binarized image data from a predetermined set of symbols prior to OCR.
The prior art teaches global techniques aimed at intelligent binarization, OCR, and document image analysis. It does not teach nor suggest local techniques aimed at text and graphic outlines as opposed to the entire text and graphics region.
It would be advantageous to provide a technique that detects text outline and line art in a color document image.
It would also be advantageous to provide a technique that provides good color reproduction of document images that contain text.
It would also be advantageous to provide a text detection technique that is simple and less computationally intensive, i.e., that requires no complex feature vectors, no transforms, no color clustering, and no cross-correlation, and thereby is suitable for high resolution scans.
It would also be advantageous to provide a text detection technique that is local, i.e., that does not require the scanning of an entire document before processing, and that is thereby fast. It would be desirable for processing to begin as the document is being scanned. Part of a character can be processed without needing the entire character. In such approach, neither the text character nor the entire word would be recognized.
It would also be advantageous to provide a text detection technique that uses adaptive thresholds on text stroke width.
It would also be advantageous to provide a text detection technique that provides important information, such as stroke width and background estimate, that may be used for a subsequent text enhancement procedure.
It would also be advantageous to provide a text detection technique that handles text on light half-tone background.
It would also be advantageous to provide a text detection technique that handles very thin text blurred by a device, such as by a scanner.
It would also be advantageous to provide a text detection technique in which a high local contrast requirement could reduce errors in detection so that they are not easily perceivable after enhancement.