This invention relates generally to a method and apparatus for automatic image segmentation using template matching filters, and more particularly to a method and apparatus for segmenting regions of differing texture or structure within a stored binary image using a template matching filter that is designed to pass at least one texture while removing one or more other textures.
The following related applications are hereby incorporated by reference for their teachings:
U.S. patent Ser. No. 08/004,479 by Shiau (published at EP-A2 0 521 662 on Jan. 7, 1993), now U.S. Pat. No. 5,293,430;
xe2x80x9cMethod for Design and Implementation of an Image Resolution Enhancement System That Employs Statistically Generated Look-Up Tables,xe2x80x9d Loce et al., Ser. No. 08/169,485, filed Dec. 17, 1993, now U.S. Pat. No. 5,696,845;
xe2x80x9cNon-Integer Image Resolution Conversion Using Statistically Generated Look-Up Tables,xe2x80x9d Loce et al., Ser. No. 08/170,082, filed Dec. 17, 1993, now U.S. Pat. No. 5,387,985;
xe2x80x9cMethod for Statistical Generation of Density Preserving Templates for Print Enhancement,xe2x80x9d Loce et al., Ser. No. 08/169,565, filed Dec. 17, 1993, now U.S. Pat. No. 5,359,423;
xe2x80x9cAutomated Template Design for Print Enhancement,xe2x80x9d Eschbach, Ser. No. 08/169,483, filed Dec. 17, 1993 , now U.S. Pat. No. 5,724,455; and
xe2x80x9cImage Resolution Conversion Method that Employs Statistically Generated Multiple Morphological Filters,xe2x80x9d Loce et al., Ser. No. 08/169,487, filed Dec. 17, 1993, now U.S. Pat. No. 5,579,445.
U.S. Pat. No. 4,194,221 to Stoffel, U.S. Pat. No. 4,811,115 to Lin et al., and U.S. Pat. No. 5,131,049 to Bloomberg et al. are hereby specifically incorporated by reference for their teachings regarding image segmentation.
The present invention is a novel approach to separating text, halftones, or other image structures in composite images using template-based filtering methods. A key application of the present invention is the segmentation of text regions from halftone regions. In the reproduction of an original document from video image data created, for example, by electronic raster input scanning from an original document, one is faced with the limited resolution capabilities of the reproducing system and the fact that output devices remain predominantly binary. This is particularly evident when attempting to reproduce halftones, lines and continuous tone images. Of course, an image data processing system may be tailored so as to offset the limited resolution capabilities of the reproducing apparatus used, but this is difficult due to the divergent processing needs required by the different image types that may be encountered. In this respect, it should be understood that the image content of the original document may consist entirely of high frequency halftones, low frequency halftones, continuous tones, text or line copy, or a combination, in some unknown degree, of some or all of the above. Optimizing the image processing system for one image type in an effort to offset the limitations in the resolution capability of the reproducing apparatus used, may not be possible, requiring a compromise choice that may not produce acceptable results. Thus, for example, where one optimizes the system for low frequency halftones, it is often at the expense of degraded reproduction of high frequency halftones, or of text or line copy, and vice versa. Beyond the issue of accurate reproduction, segmentation of different image types is key to the successful application of recognition algorithms (e.g., character recognition and glyph recognition) and efficient application of image compression techniques.
As one example of the problems encountered, reproduction of halftoned images with screening tends to introduce moire, caused by the interaction of the original screen frequency and applied screen frequency. Although the use of high frequency line screens can reduce the problem, the artifact can still occur in some images. In a networked environment particularly, it is desirable that the image processing device (e.g., raster input scanner) detect the halftone, and low-pass filter the document image into a continuous tone for subsequent halftone reproduction by printers in the network in accordance with their particular capabilities.
Heretofore, a number of applications, patents and publications have disclosed techniques for segmentation of digital image data, the relevant portions of which may be briefly summarized as follows:
U.S. patent application Ser. No. 08/044,479 to Shiau, teaches a particular problem noted in the use of an auto correlation function of the false characterization of a portion of the image as a halftone, when in fact it would be preferable for the image to be processed as a line image. Examples of this defect are noted particularly in the processing of Japanese Kanji characters and small Roman letters. In these examples, the auto correlation function may detect the image as halftones and process accordingly, instead of applying a common threshold through the character image. The described computations of auto correlation are one dimensional in nature, and this problem of false detection will occur whenever a fine pattern that is periodic in the scan line or fast scan direction is detected. In the same vein, shadow areas and highlight areas are often not detected as halftones, and are then processed with the application of a uniform threshold.
U.S. Pat. No. 4,194,221 to Stoffel, issued Mar. 18, 1980, discloses the problem of image segmentation. The problem was addressed by applying a discrimination function instructing the image processing system as to the type of image data present and particularly, an auto correlation function to the stream of pixel data, to determine the existence of halftone image data. Stoffel describes a method of processing automatically a stream of image pixels representing unknown combinations of high and low frequency halftones, continuous tones, and/or lines to provide binary level output pixels representative of the image. The described function is applied to the stream of image pixels and, for the portions of the stream that contained high frequency halftone image data, notes a large number of closely spaced peaks in the resultant signal. The correlator circuits described in Stoffel""s embodiment, however, are very expensive, as they must provide a digital multiplication function. Accordingly, as a practical matter, Stoffel requires as a first step, reduction of the amount of data handled, by initially thresholding image data against a single threshold value, to reduce the image to a high contrast black or white image. However, depending on the selection of the threshold as compared to the intensity of the image, significant amounts of information may be lost in the thresholding process. For example, if the threshold level is set to distinguish in the middle of the intensity range, but the image has significant variations through the darker gray levels, the thresholded result does not indicate the variations. This results in an undesirable loss of image information. While it may be possible to vary the threshold value adaptively from original to original and from image area to image area, such algorithms tend to be complicated and work well only for a restricted class of images such as line images.
U.S. Pat. No. 4,811,115 to Lin et al., issued Mar. 7, 1989, teaches an auto correlation function that is calculated for the stream of halftone image data at selected time delays that are predicted to be indicative of the image frequency characteristics, without prior thresholding. The arithmetic function used in that auto correlation system is an approximation of the auto correlation function that employs logical functions and addition, rather than the multiplication function used in U.S. Pat. No. 4,194,221 to Stoffel. Valleys in the resulting auto correlated function are detected to determine whether high frequency halftone image data is present.
U.S. Pat. No. 5,065,437 to Bloomberg, issued Nov. 12, 1991, discloses a method for separating finely textured and solid regions in a binary image. Initially an operation is carried out on the image to thicken text and lines and to solidify textured regions. The image is then subjected to a second set of operations that eliminates ON pixels that are near OFF pixels, thereby thinning out and eliminating the previously thickened text and lines, but leaving the previously solidified textured regions.
U.S. Pat. No. 5,131,049 to Bloomberg, issued Jul. 14, 1992, discloses a method for creating a mask for separating halftone regions in a binary image from other regions. The method includes constructing a seed image, constructing a clipping mask, and filling the seed while clipping to the mask.
U.S. Pat. No. 5,341,226 to Shiau, issued Aug. 23, 1994, discloses a method and apparatus for processing color document images to determine the presence of particular image types in order to designate areas for optimal image processing thereof. A multi-separation image defined in terms of color density for each separation is converted to a luminance-chrominance definition, where one component of the image represents image intensity. An image segmentation process operates on the image intensity signal, the results of which are used to determine processing of the multi-separation image.
UK-A-2,153,619, published August 1985, teaches a similar determination of the type of image data. However in that case, a threshold is applied to the image data at a certain level, and subsequent to thresholding the number of transitions from light to dark within a small area is counted. The system operates on the presumption that data with a low number of transitions after thresholding is probably a high frequency halftone or continuous tone image. The thresholding step in this method has the same undesirable effect as described for Stoffel.
Robert P. Loce et al. in Facilitation of Optimal Binary Morphological Filter Design via Structuring Element Libraries and Design Constraints, Optical Engineering, Vol. 31, No. 5, May 1992, pp. 1008-1025, incorporated herein by reference, describes three approaches to reducing the computational burden associated with digital morphological filter design. Although the resulting filter is suboptimal, imposition of the constraints in a suitable manner results in little loss of performance in return for design tractability.
Mathematical Morphology in Image Processing, pp. 43-90 (Edward R. Dougherty ed., Marcel Dekker 1992), hereby incorporated by reference, describes efficient design strategies for the optimal binary digital morphological filter. A suboptimal design methodology is investigated for binary filters in order to facilitate a computationally manageable design process.
Robert P. Loce et al., in Optimal Morphological Restoration: The Morphological Filter Mean-Absolute-Error Theorem, Journal of Visual Communications and Image Representation, (Academic Press), Vol. 3, No. 4, December 1992, pp. 412-432, hereby incorporated by reference, teach expressions for the mean-absolute restoration error of general morphological filters formed from erosion bases in terms of mean-absolute errors of single-erosion filters. In the binary setting, the expansion is a union of erosions, while in the gray-scale setting the expansion is a maxima of erosions. Expressing the mean-absolute-error theorem in a recursive form leads to a unified methodology for the design of optimal (suboptimal) morphological restoration filters. Applications to binary-image, gray-scale signal, and order-statistic restoration on images are included.
Edward R. Dougherty et al., in Optimal mean-absolute-error hit-or-miss filters: morphological representation and estimation of the binary conditional expectation, Optical Engineering, Vol. 32, No. 4, April 1993, pp. 815-827, incorporated herein by reference, disclose the use of a hit-or-miss operator as a building block for optimal binary restoration filters. Filter design methodologies are given for general-, maximum-, and minimum-noise environments and for iterative filters.
Robert P. Loce, in Morphological Filter Mean-Absolute-Error Representation Theorems and Their Application to Optimal Morphological Filter Design, Center for Imaging Science, Rochester Institute of Technology, (Ph.D. Thesis), May 1993, incorporated herein by reference, discloses design methodologies for optimal mean-absolute-error (MAE) morphological based filters.
In accordance with the present invention, there is provided a method performed in an digital processor for processing a document image to determine image types present therein, the steps comprising:
receiving, from an image source, a document image having a plurality of pixels therein, each pixel represented by a density signal, and storing at least a portion thereof representing a region of the document image in a data buffer;
retrieving, from the data buffer, the density signals for the document image;
determining, using template matching filters, image types present in the region of the document image.
In accordance with another aspect of the present invention, there is provided an apparatus for processing binary image pixels in an image represented by a plurality of rasters of pixels, to preferentially pass regions having a first structure therethrough so as to produce an output image primarily comprised of regions exhibiting the first structure, including:
an image memory for storing the binary image signals;
a window buffer for storing a plurality of image signals from a plurality of rasters, said image signals representing pixels centered about a target pixel;
a template filter to generate an output image signal as a function of the image signals stored in the window buffer, wherein the output signal is equivalent to the image signal for regions of the binary image where the target pixel represents the first structure, and where the output signal is zero for regions of the binary image where the target pixel represents another structure; and
an output memory for storing the output signal for each of a plurality of target pixels, wherein the signals stored in each location of said output memory are generated by said template filter as a function of the image signals within a window whose contents are determined as a function of the corresponding target pixel location.
In accordance with yet another aspect of the present invention, there is provided an apparatus for processing binary image pixels in an image represented by a plurality of rasters of pixels, to identify regions exhibiting a particular structure therein, comprising:
an image source for producing a document image having a plurality of pixels therein, each pixel represented by a density signal;
memory for storing at least a portion of the density signals representing a region of the document image in a data buffer; and
a segmentation circuit employing template-matching filters to identify the presence of the particular structure in the region of the image stored in said memory.
One aspect of the invention is based on the discovery that templates may be employed to recognize one binary structure within one or more textures. More specifically, template-based filters may be used to recognize regions of an image that contain text and line art. This discovery further avoids problems that arise in techniques that attempt to cover a broad range of document types, as the present invention further enables the xe2x80x9ccustomizationxe2x80x9d of the template-based filters used therein in response to training documents that are representative of documents commonly encountered by the image processing system. This aspect is further based on the discovery of techniques that generate statistical representations of the patterns found in text and halftone regions of documents as further described, for example, by Eschbach in U.S. application Ser. No. 08/169,483 and Loce et al. in U.S. application Ser. No. 08/169,485.
The technique described herein is advantageous because it is inexpensive compared to other approaches and is flexible, in that it can be adapted to any of a number of input document types exhibiting a wide range of possible patterns. As a result of the invention, a low-cost image segmentation system may be accomplished.