1. Field of the Invention
The present invention relates to an image determination apparatus for determining whether inputted image data is document image data or non-document image data, an image search apparatus having the image determination apparatus and a recording medium on which an image search program is recorded.
2. Description of the Related Art
There are image forming apparatuses such as a copy machine, a facsimile machine, a printer, and a multifunctional peripheral having multiple functions thereof, which have functions of storing image data such as an inputted document image in a mass-capacity storage device and reading out and reprinting registered image data that has been inputted once at any time.
The function of reprinting is convenient. However, as an amount of the registered data increases, it is difficult to find data to be outputted. Accordingly, image search technology for searching desired image data from a plurality of pieces of image data becomes important.
In order to search image data, similarity between the registered image data and inputted image data needs to be calculated by comparing the registered image data with the inputted image data. Here, document image data and non-document image data (pictures, figures, illustrations, and the like) included in the registered image data are generally mixed with each other.
According to a known technique, for example, an image search apparatus disclosed in Chinese Patent Application No. 200510005334.9, has a function of calculating a feature for the two-types of image data, that is, the document image data and the non-document image data by using the same algorithm, and does not have a processing step of discriminating between the document image data and the non-document image data or the like.
Meanwhile, as a technique applied to the discrimination between the image data, there is a region segmentation technique. The region segmentation technique has processing steps performed in advance in order to, for example, segment a piece of image data into a plurality of regions including a text region, a photograph region, a halftone region, a page background region, and the like and perform proper processing (filtering, halftoning, and the like) on the regions.
Such image segmentation techniques are disclosed in US Patent Publication U.S. Pat. No. 5,465,304A, US Patent Publication U.S. Pat. No. 7,085,420B2, Chinese Patent Application No. 200510063768.4, and the like.
In particular, a region attribute determination apparatus disclosed in Japanese Unexamined Patent Publication JP-A 4-309191 detects a position of connection between black runs, and when the connection therebetween is terminated, detects a circumscribing bounding box of a group of black pixels. The region attribute determination apparatus then generates a histogram using the heights or the widths of the circumscribing bounding boxes as a frequency of occurrence. In addition, the region attribute determination apparatus determines a region having small circumscribing bounding boxes of which the frequency of occurrence is greater than or equal to a threshold value as the photograph region, and, based on a standard deviation of the histogram, determines a region having a histogram showing relatively uniform frequencies of occurrence as the text region, and determines a region having a histogram showing relatively uneven frequencies of occurrence as the figure region.
According to the known technique, since the same algorithm is used for both types of the image data, that is, the document image data and the non-document image data to calculate the feature, search accuracy cannot be high enough. Specifically, in the search process, the inputted data needs to be compared to all of the registered image data. Therefore, as a greater amount of the registered image data exists, more time is needed for the search process.
In addition, in terms of determination accuracy for either the discrimination between the document image data or the non-document image data, high enough accuracy cannot be obtained by using the known technique.