The present invention relates generally to computerized image processing and more specifically to techniques for rapidly characterizing document images.
The rapid determination of image and graphic regions within document images is important in a number of contexts, such as rendering scanned images on paper. For example, halftone regions must be treated differently from text. The best results for halftones result when the regions are descreened (binarized) for output, while the best results for text/line-graphics result from a binarization of the input image that leaves edges sharp. The results of treating one type of region as if it were the other can be disastrous.
Similarly, in reconstructing documents, halftone regions must neither be converted to connected components (expensive in time and memory) nor be sent to an OCR system for "recognition." If there are no halftones, then the page can be analyzed without concern for the problems presented by halftones. If there are, more careful segmentation methods can be used to avoid expensive and inappropriate operations on the halftones. Further, if there are halftones and if the halftone regions are identified, segmentation and OCR processing on the remaining parts of the image (text/line-graphics) can proceed.
Direct, rapid location of text line regions and rules is also useful for the segmentation process, where the layout of the image (in terms of regions of text, line graphics, rules, and halftone) is determined. Location of rules and text columns also helps in the subsequent process of building a logical description of the image.
Rapid characterization is also important for forms analysis. Determining the location of lines has been shown to be effective for form classification. For form interpretation, it can be useful to locate finely textured regions, which can be digital data or registration marks.
Thresholded reduction, described in copending patent application Ser. No. 449,627 filed Dec. 8, 1989, titled "IMAGE REDUCTION/ENLARGEMENT TECHNIQUE," the disclosure of which is hereby incorporated by reference, uses a rank order filter (threshold convolution) followed by subsampling, where the kernel of the convolution is a square of 1's of the same size as the tiles used for subsampling. The cases where the tiles were 2.times.2, 3.times.3, and 4.times.4 are described. Optimizations for the 2.times.2 case include the combination of logical operations and reduction to eliminate rank-order computation on pixels that were not subsampled, the use of lookup tables, and special hardware implementations.
A threshold convolution over a large window (or convolution kernel) is, in general, very expensive on a large image. The convolution requires arithmetic on each pixel; the threshold requires a test. 2.times.2 reductions (tile size 2.times.2) were used for ease of implementation and for finding an efficient implementation that spanned the range of threshold values (in that case, the relatively small range from 1 to 4). Intermediate threshold values of 2 and 3 were about twice as expensive to compute as for values 1 and 4, but this is not a serious computational burden.
The special features of the 2.times.2 threshold reduction are that row operations can be done by logic, 32 pixels at a time, and column operations can be carried out with lookup tables, at, for example, 16 bits per lookup. Arithmetic on individual pixels is not required. There remains, however, the question of how to do threshold convolution cheaply, with logic and not arithmetic, over a large window.