1. Field of the Invention
The present invention relates to an image region dividing apparatus for dividing and classifying image regions of an input image including typed characters, handwritten characters, a picture, a graphics image, and the like (to be referred to as a mixed image hereinafter) in units of image kinds.
2. Description of the Related Art
In general, an image information processing system has a processing function of classifying an input mixed image in units of image kinds, and converting classified images into digital data. This processing function aims at minimizing the total data amount while maintaining high quality of information by adopting a method capable of compressing data with maximum efficiency when a mixed image is to be stored in a memory (storage media) as digital data.
The processing function often includes a function of classifying a binary gradation image such as a character, line image, or the like, and a continuous gradation image such as a picture, and selecting a suitable binarization processing method to maintain these images in a high-quality state.
Various processing methods for dividing and classifying image regions in a mixed image in units of image kinds have been proposed.
In most of these proposals, a feature amount determined for each image kind is extracted, and the image kind is determined using an evaluation function or a discrimination function defined by the feature amount. In discrimination of the image kind, the generation frequency of black pixels or edges in a predetermined block region, the histogram of luminance level, the spatial frequency distribution, the frequency distribution of directions of line segments, or the like is extracted as the feature amount.
As a feature amount similar to the above-mentioned feature amounts, some methods use the frequency distribution of density gradients of an input image, and such a method is described in, e.g., Jpn. Pat. Appln. KOKOKU Publication No. 4-18350. In this classification processing method, density gradients are calculated in units of pixels in the horizontal and vertical directions of a digital input image, and directions calculated based on the calculated horizontal and vertical density gradient values are counted in a small divided region, thereby obtaining a frequency distribution of density gradients. A variance of the frequency is calculated from the frequency distribution, and the variance is compared with a predetermined threshold value to discriminate and determine whether or not the region of interest is a character region.
The distribution of directions calculated based on the density gradients can well reflect the directivity distribution of edge portions of an image. Therefore, since particularly a typed character image including many edge components in the vertical and horizontal directions has a large difference in the distribution of directions from those of other kinds of images, the distribution of directions is an effective feature amount to determine whether or not the image to be discriminated is a typed character. Furthermore, since the variance of the distribution is used as an evaluation criterion for making a decision based on this feature amount, a bias of the directivity of edges is observed. In addition, since the calculation load of the calculation itself of the variance is relatively light, this feature amount is practical to use.
However, even when only the variance for the direction distribution of the density gradients is discriminated using a threshold value, when, for example, an image has a narrow luminance (density) level range, i.e., a low contrast, when the ratio of edge portions of a character to a small region to be discriminated is small, or when the width of the line of a character itself is small, the variance becomes small even if an image to be discriminated is a typed character image, and it becomes difficult to achieve clear discrimination. In such a case, since the direction distribution of the density gradients of a background increases in the frequency, the density gradients of a character portion do not stand distinguished relative to those of the background. Since the direction distribution of the density gradients of the background normally has no direction dependence, the direction dependence of the direction distribution of edges of a character portion is buried in the distribution of the background.
Furthermore, when not only a typed character image but also various kinds of images (a handwritten character, a picture, a graphics image, and a background) are to be sorted and classified, they cannot be discriminated from each other by observing only the variance of the direction distribution of the density gradients.
U.S. patent application Ser. No. 08/123,533 proposed by the present applicant to solve such a problem describes a mixed image region dividing apparatus, which approximates the shape of a generation frequency distribution of local feature patterns each consisting of a combination of luminances of a plurality of pixels adjacent to a pixel of interest in each of image regions divided in units of image kinds, and identifies/discriminates the image regions using a neural network on the basis of the approximated distribution shape as a feature amount. The mixed image region dividing apparatus will be described below.
The mixed image region dividing apparatus is mainly constituted by a same-kind image region extraction unit for dividing an input digital image (mixed image) in units of rectangular same-kind image regions while the image kinds are unknown, and an image kind discrimination unit for discriminating and determining the image kind of each of the divided same-kind image partial regions.
The same-kind image region extraction unit comprises an image input unit for receiving a mixed image and converting the input image into a digital image, and a region dividing unit for dividing the digital image into rectangular same-kind image regions.
The image kind discrimination unit comprises a local feature pattern detection unit for detecting a local feature pattern consisting of, e.g., a luminance of a pixel of interest and luminances of a predetermined number of (N) pixels adjacent to the pixel of interest in a predetermined small block region in each of the extracted same-kind image regions, a vector quantization unit for vector-quantizing the local feature pattern on an N-dimensional space, a histogram generation unit for counting the generation frequency of quantized representative vectors to calculate a histogram, an image kind identification unit for receiving the calculated quantized vector histogram to identify its distribution shape, and outputting a required image type, and an image kind determination unit for determining an image kind by systematically discriminating the identification result obtained in each of the same-kind image regions.
However, in the above-mentioned mixed image region dividing apparatus, normalization processing for removing a bias of a histogram depending on an image makes an originally continuous histogram discontinuous, and the discontinuous histogram may cause a discrimination error. In the arrangement of this dividing apparatus, as the number of input dimensions to the neural network increases, the time required for learning to be executed in advance becomes relatively long, and the hardware scale for performing actual processing undesirably increases.
As a conventional image dividing processing method, a method of dividing the entire document image into connected components, and setting a region as a set of connected components by integrating the connected components using a given method is known. For example, as described in Jpn. Pat. Appln. KOKAI Publication No. 61-296481, a technique for reducing an input binary document image in scale, and detecting a region by integrating adjacent black pixels is known.
In this technique, in order to prevent omission of end portions of divided image regions, it is required to faithfully reflect an edge portion such as an end point of a character upon reduction of an image. However, in this technique, an input binary document image is divided into small regions, and a black pixel is assigned to a case wherein the number of black pixels in each of small regions is equal to or larger than a predetermined threshold value. For this reason, if the threshold value is larger than "0", a small region cannot be detected when the region is located at the end of a pixel region, and some regions are omitted. On the other hand, when the threshold value is set to be "0", many noise components are undesirably detected, and an image cannot be normally divided.
In a black-white reversed document image, the entire document is undesirably extracted as a large region.