The present invention relates generally to a system for processing document images to identify image types therein, and more particularly to an improved method of automatically segmenting a document image by classifying each type of imagery with some probability.
U.S. Pat. No. 4,194,221 to Stoffel, U.S. Pat. No. 4,811,115 to Lin et al. and U.S. patent application Ser. No. 08/004,479 by Shiau et al. now U.S. Pat. No. 5,293,430 (published at EP-A2 0 521 662 on Jan. 7, 1993) are herein specifically incorporated by reference for their teachings regarding image segmentation.
In the reproduction of copies of an original document from video image data created, for example, by electronic raster input scanning from an original document, one is faced with the limited resolution capabilities of the reproducing system and the fact that output devices are mostly binary or require compression to binary for storage efficiency. This is particularly evident when attempting to reproduce halftones, lines and continuous tone images. Of course, an image data processing system may be tailored so as to offset the limited resolution capabilities of the reproducing apparatus used, but this is difficult due to the divergent processing needs required by the different types of image which may be encountered. In this respect, it should be understood that the image content of the original document may consist entirely of multiple image types, including high frequency halftones, low frequency halftones, continuous tones, line copy, error diffused images, etc. or a combination, in some unknown degree, of some or all of the above or additional image types. In the face of these possibilities, optimizing the image processing system for one image type in an effort to offset the limitations in the resolution and the depth capability of the reproducing apparatus used (e.g. a device resolution of K pixels per unit length by L pixels per unit length (Kxc3x97L) and each pixel defined at a depth b representing one of b optical densities), may not be possible, requiring a compromise choice which may not produce acceptable results. Thus, for example, where one optimizes the system for low frequency halftones, it is often at the expense of degraded reproduction of high frequency halftones, or of line copy, and vice versa.
Automatic segmentation serves as a tool to identify different image types or imagery, and identify the correct processing of such image types. In U.S. Pat. No. 4,194,221 to Stoffel, the problem of image segmentation was addressed by applying a function instructing the image processing system as to the type of image data present and particularly, an auto correlation function to the stream of pixel data, to determine the existence of halftone image data. Such a function is expressed as:                               A          ⁡                      (            n            )                          =                                            ∑                              t                =                Last                                                    t              =              0                                ⁢                      [                                          p                ⁡                                  (                  t                  )                                            xc3x97                              p                ⁡                                  (                                      t                    +                    n                                    )                                                      ]                                              (        1        )            
where
n=the bit or pixel number;
p=the pixel voltage value; and
t=the pixel position in the data stream.
Stoffel describes a method of processing automatically a stream of image pixels representing unknown combinations of high and low frequency halftones, continuous tones, and/or lines to provide binary level output pixels representative of the image. The described function is applied to the stream of image pixels and, for the portions of the stream that contained high frequency halftone image data, notes a large number of closely spaced peaks in the resultant signal.
In U.S. Pat. No. 4,811,115 to Lin et al, the auto correlation function is calculated for the stream of halftone image data at selected time delays which are predicted to be indicative of the image frequency characteristics, without prior thresholding. The arithmetic function used in that auto correlation system is an approximation of the auto correlation function using logical functions and addition, rather than the multiplication function used in U.S. Pat. No. 4,194,221 to Stoffel. Valleys in the resulting auto correlated function are detected to determine whether high frequency halftone image data is present.
U.S. patent application Ser. No. 08/004,479 by Shiau et al now U.S. Pat. No. 5,293,430 is directed to the particular problem noted in the use of the auto correlation function of the false characterization of a portion of the image as a halftone, when in fact it would be preferable for the image to be processed as a line image. Examples of this defect are noted particularly in the processing of Japanese Kanji characters and small Roman letters. In these examples, the auto correlation function may detect the image as halftones and process accordingly, instead of applying a common threshold through the character image. The described computations of auto correlation are one dimensional in nature, and this problem of false detection will occur whenever a fine pattern that is periodic in the scan line or fast scan direction is detected. In the same vein, shadow areas and highlight areas are often not detected as halftones, and are then processed with the application of a uniform threshold.
Great Britain Patent Publication No. 2,153,619A provides a similar determination of the type of image data. However in that case, a threshold is applied to the image data at a certain level, and subsequent to thresholding the number of transitions from light to dark within a small area is counted. The system operates on the presumption that data with a low number of transitions after thresholding is probably a high frequency halftone or continuous tone image. The thresholding step in this method has the same undesirable effect as described for Stoffel.
Of background interest in this area are U.S. Pat. No. 4,556,918 to Yamazaki et al. showing an arrangement assuming a periodicity of an area of halftone dots which are thresholded against an average value derived from the area to produce a density related video signal; U.S. Pat. No. 4,251,837 to Janeway, III which shows the use of a three decision mode selection for determining threshold selection based on gradient constants for each pixel; U.S. Pat. No. 4,578,714 to Sugiura et al. which shows random data added to the output signal to eliminate pseudo-outlines; U.S. Pat. No. 4,559,563 to Joiner, Jr. which suggests an adaptive prediction for compressing data based on a predictor which worked best for a previous pixel block; and U.S. Pat. No. 3,294,896 to Young, Jr. which teaches the usefulness of thresholding in producing an image from a binary digital transmission system; and U.S. Pat. No. 4,068,266 to Liao discloses a method for carrying out resolution conversion with minimum statistical error.
Also background of interest in this area are U.S. Pat. No. 4,509,195 to Nadler which describes a method for binarization of a pattern wherein two concentric rings around a pixel are evaluated to determine contrast values, and the contrast values are used then to determine whether the pixel and the surrounding areas have a light or dark quality, and U.S. Pat. No. 4,547,811 to Ochi et al. which teaches a method of processing gray level values, depending on the density level of blocks of pixels, and their difference from a minimum or maximum value. The blocks can then be processed by a halftone processing matrix depending on the difference value, U.S. Pat. No. 4,730,221 to Roetling discloses a screening technique where values of gray over an image are evaluated to determine a minimum and maximum level, in order to determine constant levels of gray. U.S. Pat. No. 4,736,253 to Shida discloses a method of producing a halftone dot by selectively comparing image signals with highlight and shadow reference values, for determination of the binarization process.
Although, significant work has been done in the automatic image segmentation area, with efforts, particularly characterized by U.S. patent application Ser. No. 08/004,479 by Shiau et al. now U.S. Pat. No. 5.293,430, to reduce the incorrect characterization of one image type as another, the problem continues to present difficulty. While image types can be characterized with a fair degree of particularity, the image content also has a tendency to impact the image. For example even using the improved methods of Shiau, some Kanji characters continue to be identified as halftones. Image quality issues are presented when the determination seemingly dithers between two image types. While this occasionally may be an accurate representation of the document intent, more commonly, it does not. Such artifacts, however, present significant problems for the ultimate user of the document.
U.S. patent application Ser. No. 08/076,651 by Williams discloses a method of reducing the occurrence of incorrect image type characterizations by additionally providing a morphological filtering operation, in conjunction with an image segmentation arrangement, which initially provides a noise removal filter operating on the image classification signal, to remove noise within an area of the image classification signal, and subsequently provides hole filling filter, which bridges small gaps in the image classification results. After consideration of document images as such, it has been noted that image classification could be considered a binary process similar to a color separation, i.e., detection is either ON or OFF. Using that model, for a single image type, classification defects can be characterized as either noise, defined as occasional image type detection in areas of predominantly non detection, or holes, defined as occasional non detection in areas of predominantly detection. Morphological filtering methods and apparatus, as described for example in U.S. Pat. No. 5,202,933 to Bloomberg, are used to coalesce areas which are primarily one state or the other. The method described by Williams aims to insure that detection and non-detection areas will remain contiguous areas, uncluttered by occasional false detections.
Additionally, U.S. patent application Ser. No. 08/076,072 by Robinson discloses a method of avoiding the misclassification of segments of images by an image segmentation arrangement by having users with a priori knowledge (such as an operator of a printing machine) provide as input to the segmentation arrangement the possible image types occurring in an original input document. This a priori knowledge (i.e. a set of image types that have been pre-determined by a user) is used to automatically eliminate classification types available to the image segmentation arrangement, thereby eliminating possible misclassifications by the segmentation arrangement. The embodiment described by Robinson redirects incorrectly classified image types to the set of image types provided by the user.
FIG. 5 shows the basic automatic segmentation system originally conceived by Stoffel and improved by Lin, Shiaw, Williams and Robinson as described hereinbefore. The basic system shown in FIG. 5 is made up of three modules. Input information stored in data buffer 10 is simultaneously directed to an image property classifying section 20, the first module, and an image processing section 30, the second module. The image property classifying section 20, made up of any number of sub-modules (e.g. auto correlator 21 and discriminator 22), determines whether a block of image pixels (picture elements) stored in data buffer 10 is of one type of imagery or another (e.g. halftone, line/text and continuous tone). In parallel with the image property classifying section 20, the image processing section 30, made up of any number of sub-processing sections (e.g. high frequency halftone processor 31, low frequency halftone processor 32, line/text processor 33 and continuous tone processor 34), performs image processing operations on the same block of image pixels as section 20. Each image sub-processing section performs image processing operations that are adapted to improve the image quality of a distinct class of imagery. The third module, control section 40, uses the information derived by the image classifying section 20 to control the image processing section 30.
The decision as to what class of imagery a block of image pixels belongs is typically binary in nature (e.g. either a yes or a no). For example, in a conventional image segmentation scheme image property classifying section 20 classifies each pixel as one of three classes of imagery (e.g. high frequency halftone, low frequency halftone and continuous tone), and depending on the classification, each pixel is processed according to the properties of that class of imagery (e.g. either low pass filtered and rescreened if it is a high frequency halftone, thresholded with a random threshold if it is a low frequency halftone or edge enhance and screened if it is a continuous tone). Also, assuming that the decision as to which of the three classes of imagery a pixel belongs is based on a single image property, the peak count of the input image data, the resulting image classification decision of the peak count image property is made by thresholding the peak count into three classes of imagery as shown in FIG. 6. Consequently, control section 40 decides the type of image processing a block of image pixels requires depending on the decision made by classification section 20 that selected between the three possible classes of imagery using thresholds 1 and 2. Thus the output of classification section 20 is quantized to one of three possibilities, control section 40 selects output from one of three image sub-processing sections.
Image classification decisions using thresholds are usually artificially abrupt since an image can change from one class of imagery to another slowly and gradually. This abrupt decision making, which produces a forced choice among several distinct alternative choices, is a primary reason for the formation of visible artifacts in the resulting output image. Most transition points or thresholds (e.g., THRESHOLD 1 or THRESHOLD 2 in FIG. 6, and A or B in FIG. 3) are selected so that an image can be classified as one class of imagery with a high degree of certainty, however those classes of imagery that can not be classified with such certainty have multiple transition points or a transition zone. Using only one point (e.g. a threshold) to define a transition zone results in the formation of visible artifacts in the resulting output image. Although, it is possible to make the transition zone narrower so that there is less chance that an image falls into the zone, there exists a limit on how narrow the zone can be made.
In general, the prior art describes control section 40 essentially as a switch, as shown in FIG. 7, since the image processing steps performed for each class of imagery are different depending on the classification given to each block of input image pixels. The switch or multiplexer allows data residing at the output of image processor 30, I31, I32, I33 and I34 to be directed to output buffer 50 depending on the decisions made by imagery classifying section 20, which are output on lines 23 and 24. This type of binary decision making is rigid and results in image segmentation decisions that do not fail gracefully and consequently form visible artifacts in the output image stored in buffer 50. There exists therefore a need for an improved image processing system that fails more gracefully when incorrect segmentation decisions are made. By failing more gracefully, the system minimizes the formation and therefore the visibility of artifacts in the resulting output image.
All references cited in this specification, and their references, are incorporated by reference herein where appropriate for appropriate teachings of additional or alternative details, features, and/or technical background.
In accordance with the invention there is provided a digital image processing system for automatically segmenting a set of input image signals into a combination of predetermined classes of imagery, the set of input image signals forming part of a video image generated by an image input terminal. The system includes a classification circuit for receiving the set of input image signals and for classifying the set of input image signals as a ratio of the predetermined classes of imagery. In combination, a plurality of image processing circuits receive the set of input image signals, each of which is adapted to process a unique class of imagery selected from the predetermined classes of imagery to generate a set of output image signals for that predetermined class of imagery., Finally, a mixing circuit combines each of the sets of output image signals determined by the plurality of image processing circuits in accordance with the ratio determined by the classification circuit to form a single set of output image signals.
In accordance with another aspect of the invention there is provides a method for automatically segmenting a set of input image signals into a combination of predetermined classes of imagery, the set of input image signals forming part of a video image generated by an image input terminal. The method includes the steps of classifying the set of input image signals as a ratio of the predetermined classes of imagery, processing the set of input image signals with a plurality of image processing circuits uniquely adapted for processing a single predetermined class of imagery to provide a plurality of sets of output image signals corresponding to each predetermined class of imagery, and combining the plurality of sets of output image signals in accordance with the ratio determined by the classifying step to form a single set of output image signals.