1. Field of the Invention
The present invention relates to an image processing apparatus and method which examine the characteristics of multi-level image data, and a medium therefor.
2. Description of the Related Art
A technique of expressing an original image by the constituent elements of the original image has recently been developed. More specifically, the attributes and format of an original image, e.g., xe2x80x9cimagexe2x80x9d, xe2x80x9cgraphic patternxe2x80x9d, xe2x80x9ccharacterxe2x80x9d, xe2x80x9cchapterxe2x80x9d, xe2x80x9csectionxe2x80x9d, xe2x80x9cparagraphxe2x80x9d, xe2x80x9ctitlexe2x80x9d, and xe2x80x9ccaptionxe2x80x9d, are defined. An image area separation technique has been realized, which is designed to output information about the attributes and format to be defined, and perform display and retrieval of the original image on the basis of the output information about the attributes and format. These techniques have become popular in the form of data exchanged through networks represented by Internet with development and widespread use of international communication networks, and in the form of SGML in the U.S.A.
In addition, a technique of transmitting and storing image data after properly encoding the image data by switching encoding schemes in units of attributes has been studied as disclosed in Matsuki et al., xe2x80x9cStructured Color Facsimile for Composite Color Documentxe2x80x9d, THE JOURNAL OF THE INSTITUTE OF IMAGE ELECTRONICS ENGINEERS OF JAPAN, Vol. 24, No. 1, pp. 26-33.
For example, Japanese Patent Laid-Open No. 8-30725 discloses a technique of inputting a binary original image, and determining information about the attributes and format of the image. In this technique, one pixel with a low resolution is extracted from a predetermined pixel area in image data obtained from an input original image, and the information about the attributes and format of the image is determined on the basis of the spread of pixels with low resolutions. The determined information about the attributes and format of an original image can be extracted, or an area having information about desired attributes and format can be extracted.
According to an image area separation technique, in inputting an original image with a scanner or the like, if, for example, the original image has density irregularity, or a shadow of the reverse side is casted, or the background density of the original image is high, noise is caused in the background of image data obtained from the original image. This noise degrades the precision of image area separation processing. In addition, when image data including such noise in the background is output at a printer or the like, the image quality of the output image is degraded. For this reason, an image processing apparatus which removes such noise from the background of image data is available.
In such an image processing apparatus designed to removes noise from the background of image data, for example, a background density is determined on the basis of the average density of an original image, and control is performed not to output image data having a density equal to or lower than the determined density, thereby removing noise from the background of the image data. Alternatively, correction such as gamma correction is performed for the input/output densities of an original image to remove noise from the background of image data.
If, however, above conventional image processing apparatus uses the method of removing noise from the background on the basis of the average density of an original image, the image quality of an output image obtained from an original image including low-density characters or a continuous-tone image portion is degraded because the densities of these portions are lower than the background density and control is performed not to output the low-density portions of the image including the low-density characters or the continuous-tone image portion.
According to the method of removing noise from the background of image data by performing correction such as gamma correction, when the background density of an original image is close to that of a white portion, output of the background density of the image data is suppressed. As a result, noise is removed from the background. If, however, the background density is high, output of the background density of the image data is emphasized. As a result, the image is output with noise caused in the background being amplified.
As described above, in these methods, the problems associated with the precision of the image area isolation technique and degradation in image quality of output images are still left unsolved in the above cases.
According to image area separation processing executed in the above conventional image processing apparatus, image data obtained by reading an original image is binarized, and image area separation processing is performed for the resultant binary image data. For this reason, a xe2x80x9cgraphic patternxe2x80x9d which can be easily binarized cannot be properly separated from a xe2x80x9cphotographxe2x80x9d which cannot be easily binarized. In addition, when an original image partly having color characters or the like for emphasis is to be recognized, image area separation is performed without recognizing the color. For this reason, even if the separate characters are recognized by OCR or the like, since the color information of the characters is not recognized, a desired OCR result cannot be obtained.
Furthermore, even image data from which noise caused in the background is removed cannot be efficiently encoded to be transmitted or stored if the background density varies.
The present invention has been made in consideration of the above situation, and has as its object to properly perform quantization in accordance with the characteristics of a target image.
In order to achieve the above object, according to the present invention, there is provided an image processing apparatus comprising, input means for inputting multi-level image data representing an image, extracting means for extracting binary image data from the multi-level image data, dividing means for dividing the image into a plurality of blocks based on the binary image data, and quantizing means for quantizing the multi-level image data in the blocks, the number of levels of quantized multi-level image data are determined in units of the blocks.
It is another object of the present invention to properly determine the number of quantization levels for quantizing image data by properly removing noise from the background of the image data.
In order to achieve the above object, according to the present invention, there is provided an image processing apparatus comprising, forming means for forming a frequency distribution of densities of image data, determining means for determining a density area exhibiting frequency values not less than a predetermined threshold and including a maximum frequency value in the distribution formed by the forming means, and first deciding means for deciding the number of quantization levels on the basis of the number of maximum values included in the distribution other than the density area determined by the determining means.
It is still another object of the present invention to properly extract color characters.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the figures thereof.