In a field of image processing apparatuses, such as a section 9, an image formation controlling section 25, an appearance data generating section 21, a classification information generating section 22, a data file save processing section 23 (saving section), a user identifying section 24, an appearance information converting section 29, a searching section 26, and a search result processing section 28. Moreover, the image processing section 9 includes (i) a copy processing section 10 having a region separation processing section 31, (ii) a scan processing section 12 having a region separation processing section 33, and (iii) a print processing section 13 having a PDL analyzing section 35.
The user interface 4 includes a processing user information inputting section 19 and a search condition inputting section 11. The search condition inputting section 11 includes an appearance information inputting section 15 and a searching user information inputting section 17.
The following explains the above-described sections.
The image inputting section 2 reads out a document by, for example, a CCD, and then outputs an image data (analog data) of the document to the copy processing section 10 or the scan processing section 12 in the image processing section 9. A document read out by the image inputting section 2 at a time is hereinafter referred to as a processed document (a plurality of documents read out by the image inputting section 2 consecutively are hereinafter referred to copy machine, a printer, a scanner, a fax machine, a multifunction printer, etc., a technique has been recently proposed in which a document (processed document), which has been processed once, is stored as a data file to reuse it later.
FIG. 11 is a flow chart showing a procedure of a conventional method for searching a processed document (see Japanese Unexamined Patent Publication 237282/1997 (Tokukaihei 9-237282, published on Sep. 9, 1997)). As shown in FIG. 11, a user first prepares an image of a model document (Step 202). That is, the user selects the following (i) through (iii) which are similar to a document the user wishes to search: (i) the type of a document, (ii) the pattern of a body of the document, and (iii) the layout of a title, a graphic, and the body, from a model document image menu on a user interface screen. In the model document image menu, the type of the document has the following particulars: newspaper, character, business letter, journal/magazine, catalogue/pamphlet, and handwriting. As the pattern of the body, the number of column settings of the document body is three, i.e., 1 column to 3 columns. The layout of the title, the graphic, and the body has the following particulars: (i) a layout in which the title is above the body and the graphic, (ii) a layout in which the title is above the body, and (iii) a layout in which the title is above the graphic.
Then, information concerning characteristics of the model document image prepared in Step 202 is obtained (Step 204). Based upon the obtained information concerning characteristics of the model document image (by using the obtained information as a key), a search is carried out with respect to a database which stores the processed documents to find a document similar to the model document image (Step 206). In order to find out whether the model document image and a searched image (image of the processed document) are similar to each other or not, a result obtained by analyzing textures of those images is used. That is, the similarity between the model document image and the searched image is checked by (i) extracting characteristic vectors of the model document image and the searched image as the information concerning characteristics of those images, and (ii) applying distance technique (Euclidean distance technique, for example) to the characteristic vectors. The characteristic vector includes 80 components. The first 20 components are prepared according to a histogram concerning the sizes of connected components in each image. The second 20 components are prepared by measuring substantial portions in each image. The third 20 components are prepared according to a vertical projection histogram concerning the connected components. The last 20 components are prepared by dividing each image into 20 cells and then obtaining the total number of connected components in each cell. Note that when finding out whether the model document image and the searched image are similar to each other or not, it is possible to combine the result (texture data) obtained by analyzing the textures of the images with a result (character data) of optical character recognition (OCR).
When the search is completed, a document image similar to the model document image is displayed as a search result (Step 208). The document image displayed here is one document which is most similar to the model document image, or cluster icons of a plurality of documents similar to the model document. In the case in which the document found in the search of Step 208 is not the desired processed document, the displayed document image or an arbitrary document in the cluster icons may be specified as a new model document image, and then the process returns to Step 206. In this way, it becomes possible to search the desired processed document again.
However, according to the above-described conventional method, the user has to prepare a model document, which is annoying. Especially, in the case in which it is difficult to convert the desired processed document into the model document (it is difficult to appropriately make choices from the menu on the user interface screen), the user has no other choice but to give up the search or repeat the search over and over again.