In the digital reproduction of documents (color, or black and white), an image is conveniently represented as a bitmap or combination of bitmaps, which may be described as an electronic image with discrete signals (hereinafter, pixels) defined by position and density. In such a system, density is described as one level in a number of possible states or levels set or determined by the system. When more than two levels of density are used in the description of the image, the levels are often termed "gray", indicating that they vary between a maximum and minimum, and without reference to their actual color. Most printing systems, have the ability to reproduce an image with a small number of levels, most commonly two, although other numbers are possible. Common input devices including document scanners, digital cameras and the computer imagery generators, however, are capable of describing an image with a substantially larger number of gray levels, with 256 levels a commonly selected number, although larger and smaller levels are possible. It is required that an image initially described at a large set of levels also be describable at a smaller set of levels, in a manner which captures the intent of the user.
In printing documents, the desired density over an area is sometimes achieved by "thresholding". Thresholding is a process of labeling pixels in an image by comparing their intensity values to a specified value called "threshold". In "multithresholding" a finite set of thresholds are specified, and the pixels are labeled depending on the value of the highest threshold value lower than its intensity value. Document image segmentation is a process of dividing a page image into regions that need distinctly different, but regionally uniform, further processing if recognition, compression or high quality printing is desired. For example, document recognition may need separation of components that are logically different, e.g. separation of text, graphics and pictures from the background, and high quality printing may need separation of components that were rendered differently, e.g., high frequency and low frequency halftone areas, contone areas and text areas from the background. These separations are often complementary, i.e., document contents that logically differ, are often rendered differently. Therefore segmentation done at the pixel level can be used for both printing and recognition tasks.
In one example, most optical character recognition (OCR) systems are fine tuned to recognize an input binary region as the character it matches the most. Hence, it is necessary to provide only those regions that are "characters" to OCR. Most OCR systems assume a two layered (sometimes, bimodal) document image model, usually black text on uniform white background. Quite often, in the case of images of modern documents, particularly those using color, this simple model is violated, and OCR systems either fail to recognize all the text on a page, or mis-recognize non character patterns as characters.
A large class of documents can be usefully segmented based on the intensity values of the pixels, because image contents are rendered at different levels of intensity. A different intensity or color is chosen to esthetically differentiate certain document contents from their surroundings. In a typical document, the number of intensity levels used for rendering different document components, is usually small, and at each intensity level a significant component of the document is rendered. In FIG. 1 an example of a documents with multiple intensity level renderings is provided. The example document image D of FIG. 1 which might be derived by scanning at an input scanner, has three dominant intensity levels. A first level contains the white background and the large white text, the second level is the light tinted background and the light text, and the third level contains the black text. The document image of FIG. 1 contains information in three intensity levels. It has text in two levels, i.e. in white and black, and the background is gray. In addition to these significant levels, a close observation of a real document would show variations in intensities within the levels due to normal, imperfect reproduction and scanning processes. A segmentation method should not be sensitive to such insignificant intensity variations.
In the absence of a priori knowledge of the rendered intensity levels, the main problem of segmentation is to find the significant intensity levels in an image at which different significant document content components are rendered. Once those intensity levels are known, one can segment the document into useful regions simply by labeling each pixel based only on its intensity value.
The document presented shows that a global thresholding process will render information-carrying regions of the document unusable. Accordingly, multiple thresholds applied to each region are desirable. Two basic assumptions about the document are made that describe the document, and simplify the problem i.e., 1) the number of levels of intensity used to differentiate the document content components are finite and small, and 2) at each chosen intensity level a significant area of the document is rendered uniformly. In other words we constrain the problem by assuming that a typical document image includes a few significant intensity levels, and that each intensity level of the image is relatively smooth without sharp intensity variations.
A document image satisfying these assumptions, will have well defined modes in its intensity histogram, as shown in FIG. 2. The histogram has peaks corresponding to each intensity level A, B and C from the document image of FIG. 1 and approximately zero values elsewhere (for simplicity, the FIG. 1, and accordingly FIG. 2 have been highly simplified and idealized). For such images, it is trivial to find the thresholds. Any threshold selected at a zero of the histogram between two non zero peaks would be an optimum threshold, and if there are n levels, the set of n-1 thresholds will optimally segment the image into its components. FIG. 2 shows the original image with three significant intensity levels and the results of thresholding at zero points between two successive well defined modes as shown in the histogram of the original image, illustrated by TCB and TBA. FIGS. 3A, 3B, and 3C show the thresholded document segments of the image are shown in the right.
However, that is not the case for real images. In a non ideal image, the optimum threshold points between significant modes cannot be detected using the trivial process described in the previous paragraph. For example, depending on system noise, there may be no zero points in the histogram. Furthermore, an automatic algorithm needs to find out the number of significant levels or prominent modes in the image to compute the optimum thresholds. If the number of modes or significant levels in a document image is known, one could use brute force statistical discriminant analysis for searching a given number of thresholds. If the used number of levels is wrong, the computed thresholds will fragment the image into unusable chunks. For an image of a bimodal document, discriminant analysis based method has been proposed. This approach has been evaluated to be one of the best for binary case, as shown in Sahoo et al., "A Survey of Thresholding Techniques", Computer Vision, Graphics, and Image Processing, Vol. 41, pages 233-260 (1988) and Reddi et al. "An Optimal Multiple Threshold Scheme for Image Segmentation, IEEE Transactions on System Man, and Cybernetics," SMC-14(4), pages 661-665, (1984). The method is easily generalizable to multiple thresholding, if number of levels are known a priori, as shown in Otsu, "A Threshold Selection Method from Gray-Level Histograms", IEEE Transactions on Systems, Man, and Cybernetics, SMC-9(1) pages 62-66 (1979). This reference details the concept of "goodness". Goodness measure in the proposed thresholding method corresponds to statistical separability of the intensity distribution of the image on either side of the threshold. If there are two distributions that occur with probabilities p1 and p2, and their means are "d" distance apart, then the used measure of separability (goodness) is: ##EQU1## This value is 1 when two distributions are perfectly separated, and it is zero if the distributions are not separated. There are several other separability measures described in the statistics literature. The above measure is computable from the first order statistics and therefore it is efficient to compute. Note also, "Binarization and Multi-Thresholding of Document Images Using Connectivity", by Lawrence O'Gorman, University of Nevada, Las Vegas Annual Symposium of Document Analysis & Information Retrieval, (April 1994).
References disclosed herein are incorporated by reference for their teachings.